Session II: big data

RUB » Sub28

Special Session II

Big Data in Semantics and Pragmatics

The analysis of large amounts of corpus data is no longer an exception in Semantics and Pragmatics. However, this turn towards considering large corpora is accompanied not only by the analysis of complex structures found in the data, but also by the use of new methods for working with these data. While semantic theories still predominantly use logic-based representations for describing meanings, distributional approaches to semantics use concepts from linear algebra like vectors, matrices and tensors as meaning representations, and corresponding operations in order to identify meanings in language use. The availability of large (annotated) corpora also allows to get new insights into meaning and language use with machine learning methods, be it supervised learning from a training set of meaning-related labeled examples, unsupervised learning for finding hidden structures in unlabeled semantic data, or reinforcement learning for learning how to act in a dialogue in a pragmatically appropriate way. The probabilistic turn in formal pragmatic theories requires extensive data sets, possibly collected experimentally, for testing these theories. Machine learning techniques, however, do not play a prominent role here up to now.

This special session aims to bring together researchers with an interest in using large data sets for semantic and pragmatic analyses. Topics of this session include all aspects of creating and annotating corpora and other data sets for semantic and pragmatic issues, and, in particular, using such corpora and data sets for testing corresponding semantic and pragmatic theories, be it by confronting these theories with corpus data, or by learning nexuses the theories should deal with.

Invited speaker: Raquel Fernandez (University of Amsterdam)
Organizer: Ralf Klabunde (RUB, Linguistic Data Science Lab)