evaluation with AI – CIMT – Citizen Involvement in Mobility Transitions

5th workshop for practitioners on recommendations for action generated from the results

On October 31, November 7 and December 11, workshops for practitioners were held at which we presented recommendations for action and discussed them with the participants. The participants were administrative staff responsible for citizen participation in the various municipalities with which we cooperated and who were involved in the planning and implementation of the participation processes that we examined in our research.

In the course of our investigation of various open consultative participation formats on the topic of urban mobility planning, we were able to generate various findings from which theses can be derived. In a further step, we combined these numerous theses into seven recommendations for action, which are intended to support the implementation of consultative participation formats. At the beginning of the workshops, we used an example to outline the path from insight to recommendations for action before the practitioners themselves got involved and were able to leave comments in our mind map. With the help of digital sticky notes, they added their opinions, additions, criticisms and experiences to the individual recommendations for action. This was followed by discussions on individual recommendations for action. Important points were:

The usefulness of the recommendations for action in participation practice as tools for classifying one’s own participation

The usefulness of the recommendations for action in participation practice as an aid to justifying the importance of participation

Overall, the experts agreed that the results of our research are very helpful in communicating the challenges of citizen participation and the resulting consequences to policymakers. In addition, many of the practitioners noted that they found the link to the results of the research clear and structured. Some had the impression that participation and specific consultations are viewed critically in municipal administrations. They share the view that our results can help to train administrative staff and make them aware of the usefulness of participation procedures.

Major points of discussion in the workshops were

the specificity of the recommendations for action and the inclusion of examples in the presentation of results

a potentially stronger emphasis on the transparency aspect through the recommendations for action

an arrangement of the recommendations for action in the chronological order of a participation process

The planners note that the recommendations for action could be made more specific in order to clarify their practical relevance and make them more likely to be applied. In the form in which they were presented, they were rather general and always in strict relation to the results of the research. It was suggested that the recommendations be underpinned with examples from specific participation formats. For example, our research objects could be mentioned, which form the basis of our findings, the theses and thus also the recommendations for action.

Although reformulations and concretizations have been made, examples cannot be found directly in the recommendations. This would have been complicated, especially with regard to the partly abstract quantitative results. However, some examples from the specific participation processes form the basis for the development of the recommendations for action and are sometimes used to underline their importance.

Another aspect that the experts raised is that different tasks and questions arise at different times in a planning process. Some of the seven recommendations for action relate to the planning, implementation or evaluation of the procedures. It was suggested that specific attention should be paid to the participation process and that the recommendations be arranged according to the different stages. This was implemented in the order of the recommendations.

At the end of the workshops, we asked for suggestions for the publication of the results. It was emphasized how important it is for the planners to be able to find these recommendations easily and it was recommended to use existing networks in order to disseminate the results as widely as possible.

We would like to thank the practitioners for their time and important input – and to a large extent for their years of cooperation. We gained many important insights and suggestions that will help us in our work on a helpful and practical publication of recommendations for action.

AI for the evaluation of participation? The potential of language models to recognise modes of transport in participation contributions

In this article in the journal Internationales Verkehrswesen, Laura Mark, Julia Romberg and Tobias Escher present a language model that can be used to reliably recognise modes of transport in participation contributions. They show that supervised machine learning can usefully support the evaluation of participation contributions in mobility-related online participation processes.

Summary

Consultations are an important part of transport planning and can help to integrate knowledge from the public into the planning process. However, online formats in particular often result in large volumes of contributions, the thorough evaluation of which is resource-intensive. It is hoped that the use of AI will support this.

The language model presented in this article is based on the concept of supervised machine learning for text classification. Pre-trained models are re-fined using smaller data sets. In this way, a model can be adapted to a specific area of application, such as mobility-related consultation processes.

A pre-trained German-language version of the high-performance RoBERTa language model was used as a starting point. Using a categorisation scheme that mainly distinguishes between the modes of transport mentioned, 1,700 contributions from seven transport planning consultation processes were manually coded. The resulting data was used partly as training data for fine-tuning the language model and partly for evaluation.

Results

Overall, it was shown that language models already available today are suitable for supporting the evaluation of consultation processes in practice. The language model developed here for recognising the modes of transport can serve as the basis for a specific application in municipal planning practice.
The post-trained RoBERTa language model is very effective at assigning the appropriate modes of transport. The model presented by us can always reliably assign well over 90% of the entries correctly to the modes of transport they contain.
For the processes on whose contributions the model had been trained, an average of 97% of the categories could be correctly assigned (on a separate test set). For contributions from other transport-related participation procedures, the appropriate modes of transport could still be assigned very reliably with an accuracy of 91 to 94%.
The performance of the model therefore hardly deteriorates when it is applied to previously unknown data from mobility-related participation procedures. This means that manual coding in advance can be omitted, at least for similarly structured participation procedures, which significantly reduces the effort involved.

Publication

Mark, Laura; Romberg, Julia; Escher, Tobias (2024): KI zur Auswertung von Beteiligung. In: Internationales Verkehrswesen 76 (1), S. 12–16. DOI: 10.24053/iv-2024-0003

Annotation and Provision of Datasets

As part of our project, we worked on the manual annotation of a large number of datasets with the aim of supporting the development of AI methods for evaluating public participation contributions.

Supervised machine learning requires training datasets in order to learn patterns related to the respective codings. In the area of citizen participation, there is a lack of comprehensively coded German-language datasets. In order to meet this need, we have therefore worked on annotating German-language participation processes from the field of mobility according to four dimensions:

Firstly, we have thematically classified contributions according to modes of transportation, other requirements for public space, and defects that need to be fixed immediately.
Second, we coded processes by argumentative sentences and divided them into premises and conclusions.
Thirdly, we have assigned argumentative units of meaning to how concrete they are.
Fourthly, we have coded textual location information.

A more detailed description of the datasets – as of June 2022 – can be found in our publication: Romberg, Julia; Mark, Laura; Escher, Tobias (2022, June). A Corpus of German Citizen Contributions in Mobility Planning: Supporting Evaluation Through Multidimensional Classification. Since then, we have continued to work on the thematic coding of the datasets and revised our scheme of modes of transport.

The following table shows the current status of annotation and is updated on an ongoing basis (in German): Google-Sheet

In accordance with our open source policy, the annotated datasets are made available to the public under Creative Commons CC BY-SA License when possible.

A number of publications have been produced based on these data sets. These can be found at https://www.cimt-hhu.de/gruppe/romberg/romberg-veroeffentlichungen/.

Master’s thesis on the thematic classification of participation contributions with Active Learning

As part of his Master’s thesis in the MA Computer Science at Heinrich Heine University Düsseldorf, Boris Thome dealt with the classification of participation contributions according to the topics they contain. This thesis continues the work of Julia Romberg and Tobias Escher by examining a finer classification of contributions according to subcategories.

Summary

Political authorities in democratic countries regularly consult the public on specific issues but subsequently evaluating the contributions requires substantial human resources, often leading to inefficiencies and delays in the decision-making process. Among the solutions proposed is to support human analysts by thematically grouping the contributions through automated means.

While supervised machine learning would naturally lend itself to the task of classifying citizens’ proposal according to certain predefined topics, the amount of training data required is often prohibitive given the idiosyncratic nature of most public participation processes. One potential solution to minimize the amount of training data is the use of active learning. In our previous work, we were able to show that active learning can significantly reduce the manual annotation effort for coding top-level categories. In this work, we subsequently investigated whether this advantage is still given when the top-level categories are subdivided into subcategories. A particular challenge arises from the fact that some of the subcategories can be very rare and therefore only cover a few contributions.

In the evaluation of various methods, data from online participation processes in three German cities was used. The results show that the automatic classification of subcategories is significantly more difficult than the classification of the main categories. This is due to the high number of possible subcategories (30 in the dataset under consideration), which are very unevenly distributed. In conclusion, further research is required to find a practical solution for the flexible assignment of subcategories using machine learning.

Publication

Thome, Boris (2022): Thematische Klassifikation von Partizipationsverfahren mit Active Learning. Masterarbeit am Institut für Informatik, Lehrstuhl für Datenbanken und Informationssysteme, der Heinrich-Heine-Universität Düsseldorf. (Download)

Master’s thesis on the automated classification of arguments in participation contributions

As part of her master’s thesis in the MA Computer Science at Heinrich Heine University Düsseldorf, Suzan Padjman dealt with the classification of argumentation components in participation contributions. This thesis continues our team’s previous work by looking at cases in which argumentative sentences can contain both a premise and a conclusion.

Summary

Public participation processes allow citizens to engage in municipal decision-making processes by expressing their opinions on specific issues. Municipalities often only have limited resources to analyze a possibly large amount of textual contributions that need to be evaluated in a timely and detailed manner. Automated support for the evaluation is therefore essential, e.g. to analyze arguments.

When classifying argumentative sentences according to type (here: premise or conclusion), it can happen that one sentence contains several components of an argument. In this case, there is a need for multi-label classification, in which more than one category can be assigned.

To solve this problem, different methods for multi-label classification of argumentation components were compared (SVM, XGBoost, BERT and DistilBERT). The results showed that BERT models can achieve a macro F1 score of up to 0.92. The models exhibit robust performance across different datasets – an important indication of the practical usability of such methods.

Publication

Padjman, Suzan (2022): Mining Argument Components in Public Participation Processes. Masterarbeit am Institut für Informatik, Lehrstuhl für Datenbanken und Informationssysteme, der Heinrich-Heine-Universität Düsseldorf. (Download)

Project work on the automated recognition of locations in participation contributions

As part of her project work in the MA Computer Science at Heinrich Heine University Düsseldorf, Suzan Padjman worked on the development of methods for the automated recognition of textually described location information in participation procedures.

Summary

In the context of the mobility transition, consultative processes are a popular tool for giving citizens the opportunity to represent and contribute their interests and concerns. Especially in the case of mobility-related issues, an important analysis aspect of the collected contributions is which locations (e.g. roads, intersections, cycle paths or footpaths) are problematic and in need of improvement in order to promote sustainable mobility. Automated identification of such locations has the potential to support the resource-intensive manual evaluation.

The aim of this work was therefore to find an automated solution for identifying locations using methods from natural language processing (NLP). For this purpose, a location was defined as the description of a specific place of a proposal, which could be marked on a map. Examples of locations are street names, city districts and clearly assignable places, such as “in the city center” or “at the exit of the main train station”. Pure descriptions without reference to a specific place were not considered as locations. Methodologically, the task was regarded as a sequence labeling task, as locations often consist of several consecutive tokens, so-called word sequences.

A comparison of different models (spaCy NER, GermanBERT, GBERT, dbmdz BERT, GELECTRA, multilingual BERT, multilingual XLM-RoBERTa) on two German-language participation datasets on cycling infrastructure in Bonn and Cologne Ehrenfeld showed that GermanBERT achieves the best results. This model can recognize tokens that are part of a textual location description with a promising macro F1 score of 0.945. In future work, it is planned to convert the recognized text phrases into geocoordinates in order to depict the recognized location of citizens’ proposals on a map.

Publication

Padjman, Suzan (2021): Unterstützung der Auswertung von verkehrsbezogenen Bürger*innenbeteiligungsverfahren durch die automatisierte Erkennung von Verortungen. Projektarbeit am Institut für Informatik, Lehrstuhl für Datenbanken und Informationssysteme, der Heinrich-Heine-Universität Düsseldorf. (Download)

Measures for a more user-centered evaluation of classification quality

This paper, presented at the 14th International Conference on Recent Advances in Natural Language Processing, introduces measures to evaluate practical requirements for the usability of AI-based tools.

Summary

A solution to limited annotation budgets is active learning (AL), a collaborative process between humans and machines for the strategic selection of a small but informative set of examples. While current approaches optimize AL from a machine learning perspective, we argue that successful real-world deployment requires additional criteria that target the second pillar of AL: the human annotators and their needs. For example, the usefulness of AL methods in text classification is typically assessed using common performance measures such as accuracy or F₁. However, such measures fall short when applied to real-world datasets, which often contain a higher number of imbalanced classes. In these scenarios, additional criteria—such as quickly identifying all classes (e.g., topics) or detecting rare cases—become important. We therefore introduce four measures that reflect the class-specific requirements users have for data collection and content analysis.

In a comprehensive comparison of uncertainty-, diversity-, and hybrid-based data selection strategies across six different datasets, we find, for example, that strong F₁ performance does not necessarily correspond to complete class coverage (i.e., not all topics are identified), and that different data selection strategies exhibit varying strengths and weaknesses with respect to class-specific requirements. Our empirical findings highlight that a holistic perspective is essential when evaluating AL approaches to ensure their practical usefulness. To this end, standard measures for evaluating machine-based text classification must be complemented by those that better reflect user needs.

Selected results

This publication proposes four new class-specific evaluation metrics for AL approaches that take into account how well and how quickly rare or all classes are detected. These criteria are not captured in detail by standard metrics such as F₁.
The new measures enable practice-relevant insights into performance, particularly on datasets with varying frequently occurring classes and a wide range of different classes – characteristics that are common in real-world applications (e.g., topic detection in public participation processes).
It becomes clear that the choice of an appropriate AL strategy should not be based solely on standard performance measures. For example, the top-performing approaches according to the F₁ score may not ensure that all classes are identified, despite this being a crucial requirement in the automated analysis of participation contributions: no topic should be overlooked, no voice should go unheard. The measures we developed can provide additional guidance in strategy selection and thus support the choice of practice-oriented solutions.

Publication

Romberg, J. (2023). Mind the User! Measures to More Accurately Evaluate the Practical Value of Active Learning Strategies. Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing, 996–1006. https://aclanthology.org/2023.ranlp-1.107/

Overview of Methods for Computational Text Analysis to Support the Evaluation of Contributions in Public Participation

In this publication in Digital Government: Research and Practice Julia Romberg and Tobias Escher offer a review of the computational techniques that have been used in order to support the evaluation of contributions in public participation processes. Based on a systematic literature review, they assess their performance and offer future research directions.

Abstract

Public sector institutions that consult citizens to inform decision-making face the challenge of evaluating the contributions made by citizens. This evaluation has important democratic implications but at the same time, consumes substantial human resources. However, until now the use of artificial intelligence such as computer-supported text analysis has remained an under-studied solution to this problem. We identify three generic tasks in the evaluation process that could benefit from natural language processing (NLP). Based on a systematic literature search in two databases on computational linguistics and digital government, we provide a detailed review of existing methods and their performance. While some promising approaches exist, for instance to group data thematically and to detect arguments and opinions, we show that there remain important challenges before these could offer any reliable support in practice. These include the quality of results, the applicability to non-English language corpora and making algorithmic models available to practitioners through software. We discuss a number of avenues that future research should pursue that can ultimately lead to solutions for practice. The most promising of these bring in the expertise of human evaluators, for example through active learning approaches or interactive topic modelling.

Key findings

There are a number of tasks in the evaluation processes that could be supported through Natural Language Processing (NLP). Broadly speaking, these are i) detecting (near) duplicates, ii) grouping of contributions by topic and iii) analyzing the individual contributions in depth. Most of the literature in this review focused on the automated recognition and analysis of arguments, one particular aspect of the task of in-depth analysis of contribution.
We provide a comprehensive overview of the datasets used as well as the algorithms employed and aim to assess their performance. Generally, despite promising results so far the significant advances of NLP techniques in recent years have barely been exploited in this domain.
A particular gap is that few applications exist that would enable practitioners to easily apply NLP to their data and reap the benefits of these methods.
The manual labelling efforts required for training machine learning models risk any efficiency gains from automation.
We suggest a number of fruitful future research avenues, many of which draw upon the expertise of humans, for example through active learning or interactive topic modelling.

Publication

Romberg, Julia; Escher, Tobias (2023): Making Sense of Citizens’ Input through Artificial Intelligence. In: Digital Government: Research and Practice, Artikel 3603254. DOI: 10.1145/3603254.

Supporting the Manual Evaluation Process of Citizen’s Contributions Through Natural Language Processing

Doctoral thesis (full text) of Julia Romberg

Engaging citizens in decision-making processes is a widely implemented instrument in democracies. On the one hand, such public participation processes serve the goal of achieving a more informed process and thus to potentially improve the process outcome, i.e. resulting policies, through the ideas and suggestions of the citizens. On the other hand, involving the citizenry is an attempt to increase the public acceptance of decisions made.

As public officials try to evaluate the often large quantities of citizen input, they regularly face challenges due to restricted resources (e.g. lack of personnel,time limitations). When it comes to textual contributions, natural language processing (NLP) offers the opportunity to provide automated support for the evaluation, which to date is still carried out mainly manually. Although some research has already been conducted in this area, important questions have so far been insufficiently addressed or have remained completely unanswered.

My dissertation, which I successfully completed in 2023, therefore focused on how existing research gaps can be overcome with the help of text classification methods. A particular emphasis was placed on the sub-tasks of thematic structuring and argument analysis of public participation data.

The thesis begins with a systematic literature review of previous approaches to the machine-assisted evaluation of textual contributions (for more insights, please refer to this article). Given the identified shortage of language resources, subsequently the newly created multidimensionally annotated CIMT corpus to facilitate the development of text classification models for German-language public participation is presented (for more insights, please refer to this article).

The first focus is on the thematic structuring of public input, particularly considering the uniqueness of many public participation processes in
terms of content and context. To make customized models for automation worthwhile, we leverage the concept of active learning to reduce manual workload by optimizing training data selection. In a comparison across three participation processes, we show that transformer-based active learning can significantly reduce manual classification efforts for process sizes starting at a few hundred contributions while maintaining high accuracy and affordable runtimes (for more insights, please refer to this article). We then turn to the criteria of practical applicability that conventional evaluation does not encompass. By proposing measures that reflect class-related demands users place on data acquisition, we provide insights into the behavior of different active learning strategies on class-imbalanced datasets, which is a common characteristic in collections of public input.

Afterward, we shift the focus to the analysis of citizens’ reasoning. Our first contribution lies in the development of a robust model for the detection of argumentative structures across different processes of public participation. Our approach improves upon previous techniques in the application domain for the recognition of argumentative sentences and, in particular, their classification as argument components (for more insights, please refer to this article). Following that, we explore the machine prediction of argument concreteness. In this context, the subjective nature of argumentation was accounted for by presenting a first approach to model different perspectives in the input representation of machine learning in argumentation mining (for more insights, please refer to this article).

publication

Romberg, Julia. (2023): Machine-assisted text classification of public participation contributions. Dissertation am Institut für Informatik an der Heinrich-Heine-Universität Düsseldorf.

CAIS Working Group: AI in digital public participation

As participants in a workshop organised by the Center for Advanced Internet Studies (CAIS) in Bochum, Julia Romberg and Tobias Escher presented results of the CIMT research on AI-supported evaluation of participation contributions and discussed further possibilities for using artificial intelligence to support public participation with experts from research as well as participation practice. It became clear that the practitioners see potential not only in the evaluation (output), but also in the activation of participants (input) and in the support of interactions (throughput) in participation processes. Nevertheless, these potentials face challenges and risks, including the adequate technical implementation and ensuring data protection and non-discrimination.

The workshop was organised by Dr Dennis Frieß and Anke Stoll and took place from 8 to 10 February 2023 in Bochum. Further information can be found on the website of the Düsseldorf Institute for Internet and Democracy.