The core objective of my research is the development of application-oriented procedures for the (partially) automated classification of citizen contributions in public participation (such as proposals or comments) in order to support the evaluation of such engagement processes in science and practice.
The research focuses on developing strategies to automatically structure participation data either by focusing on the content of participation or by classifying contributions according to more formal aspects. Natural Language Processing and machine learning are used for this purpose.
For the content structuring two different approaches can be investigated: classification and clustering. First, if a set of categories is predefined on a given participation data set and the assignments of a part of the documents (i.e. individual citizens’ contributions) is known a-priori, supervised learning approaches can be used to develop an appropriate classification algorithm. The
trained model can then be applied to categorize unlabeled and new contributions automatically. Second, if the topics discussed in the participation project are not categorized beforehand, topics must be recognized by an unsupervised approach as there is no labeled data at this point. Participation data can be sorted into groups using topic modeling or clustering techniques. The resulting structure builds a classification scheme which can again be applied on new data. To facilitate user-driven evaluation of participation processes, not only content related but also form related structuring of the data can be helpful. Such formal aspects include, but are not limited to, readability, civility, constructiveness, novelty and, more sophisticated, degree of feasibility or sustainability of a contribution. A focus on formal aspects of participation lead to a different classification task.
Both classification research and clustering research are extensive tasks. To
ensure that no compromises have to be made in the level of detail, in the context of this PhD dissertation, the automated classification of citizen contributions in public participation (such as proposals or comments) in order to support the evaluation of such engagement processes in science and practice is examined. Research on clustering is postponed until after the doctorate.
In parallel, the developed approaches are transferred into a user-friendly and easy-to-use software to provide the interested public the opportunity to apply these structuring approaches in practice. Within this software visualization solutions for (aggregated) participation will also be implemented.
Special challenges for my research project arise from the German language’s special characteristics, the informal language in (online) participation procedures, data scarcity, the brevity of contributions and the shortcoming on known topic models for the separation of strongly related topics.