Doctoral thesis (full text) of Julia Romberg
Engaging citizens in decision-making processes is a widely implemented instrument in democracies. On the one hand, such public participation processes serve the goal of achieving a more informed process and thus to potentially improve the process outcome, i.e. resulting policies, through the ideas and suggestions of the citizens. On the other hand, involving the citizenry is an attempt to increase the public acceptance of decisions made.
As public officials try to evaluate the often large quantities of citizen input, they regularly face challenges due to restricted resources (e.g. lack of personnel,time limitations). When it comes to textual contributions, natural language processing (NLP) offers the opportunity to provide automated support for the evaluation, which to date is still carried out mainly manually. Although some research has already been conducted in this area, important questions have so far been insufficiently addressed or have remained completely unanswered.
My dissertation, which I successfully completed in 2023, therefore focused on how existing research gaps can be overcome with the help of text classification methods. A particular emphasis was placed on the sub-tasks of thematic structuring and argument analysis of public participation data.
The thesis begins with a systematic literature review of previous approaches to the machine-assisted evaluation of textual contributions (for more insights, please refer to this article). Given the identified shortage of language resources, subsequently the newly created multidimensionally annotated CIMT corpus to facilitate the development of text classification models for German-language public participation is presented (for more insights, please refer to this article).
The first focus is on the thematic structuring of public input, particularly considering the uniqueness of many public participation processes in
terms of content and context. To make customized models for automation worthwhile, we leverage the concept of active learning to reduce manual workload by optimizing training data selection. In a comparison across three participation processes, we show that transformer-based active learning can significantly reduce manual classification efforts for process sizes starting at a few hundred contributions while maintaining high accuracy and affordable runtimes (for more insights, please refer to this article). We then turn to the criteria of practical applicability that conventional evaluation does not encompass. By proposing measures that reflect class-related demands users place on data acquisition, we provide insights into the behavior of different active learning strategies on class-imbalanced datasets, which is a common characteristic in collections of public input.
Afterward, we shift the focus to the analysis of citizens’ reasoning. Our first contribution lies in the development of a robust model for the detection of argumentative structures across different processes of public participation. Our approach improves upon previous techniques in the application domain for the recognition of argumentative sentences and, in particular, their classification as argument components (for more insights, please refer to this article). Following that, we explore the machine prediction of argument concreteness. In this context, the subjective nature of argumentation was accounted for by presenting a first approach to model different perspectives in the input representation of machine learning in argumentation mining (for more insights, please refer to this article).