In this publication in the Conference on Language Resources and Evaluation, Julia Romberg, Laura Mark and Tobias Escher introduce a collection of annotated datasets that promotes the development of machine learning approaches to support the evaluation of public participation contributions.
Abstract
Political authorities in democratic countries regularly consult the public in order to allow citizens to voice their ideas and concerns on specific issues. When trying to evaluate the (often large number of) contributions by the public in order to inform decision-making, authorities regularly face challenges due to restricted resources.
We identify several tasks whose automated support can help in the evaluation of public participation. These are i) the recognition of arguments, more precisely premises and their conclusions, ii) the assessment of the concreteness of arguments, iii) the detection of textual descriptions of locations in order to assign citizens’ ideas to a spatial location, and iv) the thematic categorization of contributions. To enable future research efforts to develop techniques addressing these four tasks, we introduce the CIMT PartEval Corpus, a new publicly-available German-language corpus that includes several thousand citizen contributions from six mobility-related planning processes in five German municipalities. The corpus provides annotations for each of these tasks which have not been available in German for the domain of public participation before either at all or in this scope and variety.
Key findings
- The CIMT PartEval Argument Component Corpus comprises 17,852 sentences from German public participation processes annotated as non-argumentative, premise, or major position.
- The CIMT PartEval Argument Concreteness Corpus consists of 1,127 argumentative text spans that are annotated according to three levels of concreteness: low, intermediate, and high.
- Der CIMT PartEval Geographic Location Corpus consists of 4,830 locations and the GPS coordinates for 2,529 proposals from public consultations.
- The CIMT PartEval Thematic Categorization Corpus relies on a new hierarchical categorization scheme for mobility that captures modes of transport (non-motorized transport: cycling, walking, scooters; motorized transport: local public transport, long-distance public transport, commercial transport) and a number of specifications, such as moving or stationary traffic, new services, and inter- and multimodality. In total, 697 documents have been annotated according to this scheme.
Publication
Romberg, Julia; Mark, Laura; Escher, Tobias (2022, June). A Corpus of German Citizen Contributions in Mobility Planning: Supporting Evaluation Through Multidimensional Classification. In Proceedings of the Language Resources and Evaluation Conference (pp. 2874–2883), Marseille, France. European Language Resources Association. https://aclanthology.org/2022.lrec-1.308
Corpus available under
https://github.com/juliaromberg/cimt-argument-mining-dataset
https://github.com/juliaromberg/cimt-argument-concreteness-dataset
https://github.com/juliaromberg/cimt-geographic-location-dataset
https://github.com/juliaromberg/cimt-thematic-categorization-dataset