Master’s thesis on the thematic classification of participation contributions with Active Learning

As part of his Master’s thesis in the MA Computer Science at Heinrich Heine University Düsseldorf, Boris Thome dealt with the classification of participation contributions according to the topics they contain. This thesis continues the work of Julia Romberg and Tobias Escher by examining a finer classification of contributions according to subcategories.

Summary

Political authorities in democratic countries regularly consult the public on specific issues but subsequently evaluating the contributions requires substantial human resources, often leading to inefficiencies and delays in the decision-making process. Among the solutions proposed is to support human analysts by thematically grouping the contributions through automated means.

While supervised machine learning would naturally lend itself to the task of classifying citizens’ proposal according to certain predefined topics, the amount of training data required is often prohibitive given the idiosyncratic nature of most public participation processes. One potential solution to minimize the amount of training data is the use of active learning. In our previous work, we were able to show that active learning can significantly reduce the manual annotation effort for coding top-level categories. In this work, we subsequently investigated whether this advantage is still given when the top-level categories are subdivided into subcategories. A particular challenge arises from the fact that some of the subcategories can be very rare and therefore only cover a few contributions.

In the evaluation of various methods, data from online participation processes in three German cities was used. The results show that the automatic classification of subcategories is significantly more difficult than the classification of the main categories. This is due to the high number of possible subcategories (30 in the dataset under consideration), which are very unevenly distributed. In conclusion, further research is required to find a practical solution for the flexible assignment of subcategories using machine learning.

Publication

Thome, Boris (2022): Thematische Klassifikation von Partizipationsverfahren mit Active Learning. Masterarbeit am Institut für Informatik, Lehrstuhl für Datenbanken und Informationssysteme, der Heinrich-Heine-Universität Düsseldorf. (Download)