Use Case
The project heureCLÉA aims for the development of a so-called “digital heuristic”– a functional module that supports literary scholars in interpreting and annotating texts. To achieve this, the module “learns” from human-made annotation in order to progress toward an automated generation of textual markup. In our project we explore this approach by way of example: the semantic analysis of time-related phenomena in narrative texts. We envisage the following three phases of development:
- automation of markup-tasks of low complexity
- exploration and analysis of more complex, manually as well as automatically generated markup versions
- computer-aided modeling and generation of markup versions
Within this framework we develop a digital heuristic that offers the user automated procedures for textual analysis. These procedures shall generate markup suggestions that can then be validated by human users. To create its output, the system dynamically deduces probabilistic rules for interpretation and annotation from manually created markup. This heuristic module will finally be integrated into a web-based application that enables the creation of non-deterministic, collaborative textual markup.
Resources: Input
Data:
- non-literary corpora with normalized time-related expressions: WikiWarsDE, Time4SMS, Time4SCI (available on http://dbs.ifi.uni-heidelberg.de/index.php?id=129)
- corpus of literary texts in German and English language and multi-modal graphic narratives
Procedures:
- extraction and normalization of time-related expressions with HeidelTime [5]
- manual annotation of time-related phenomena in the corpus in CATMA
- testing of visualization options with the help of Voyant tools [6]
- linguistic procedures of dependency analysis of texts in the context of time-related markups as well as correlation methods of time-phenomena and -markups
Resources: Output
- corpus of literary texts in German and English language as well as multi-modal graphic narratives with time-related annotations
- implementation of a web-based platform for non-deterministic, collaborative markup
- implementation of a collaborative work environment as “digital heuristic” for the semi-automated, semi-interactive generation of time-related markup
- implementation of different options of visualizing and exploring detected time-structures in documents or large quantities of markup
Method
The taxonomy of time-related phenomena in narrative texts is based on narratological categories. [1]. The subsequent tagging of time-representations and -relations in a corpus of narratives is carried out by:
- the system HeidelTime through automated extraction and normalization of time-related phenomena [2];
- collaborative, manual tagging according to the narratological taxonomy with the web application CATMA [3].
By means of machine learning processes, a subset of the markup data created in CATMA will be analyzed to detect regularities. The machine learning approach uses supervised learning methods based on manually created and verified time-markups.
In order to derive rules and to integrate them into HeidelTime, procedures of dependency analysis will then be applied on text level as well as methods of correlation of temporal expressions in different contexts.
The program module heureCLÉA will then be provided to the users in a collaborative work environment [4] as “digital heuristic” that supports the semi-automated, semi-interactive generation of temporal markup. The main component of the module will be HeidelTime, which is used for the automated specification of temporal markup in texts.
Finally, the temporal markup automatically created by heureCLÉA will be tested by users in the field of humanities and through evaluation of the achieved functionality based on methods of information science.
Cooperations
- Digital Commons Initiative” (Prof. Stéfan Sinclair, Prof. Jan Christoph Meister)
- envisaged: TextGrid, DARIAH CLARIN-D
- further partners with expertise in the fields of extraction and normalization of temporal expressions and events are still being sought
References
- Genette, Gérard (1972). “Discours du récit.” In: ibid. Figures III. Paris, 67-282.
- Lahn, Silke & Meister, Jan Christoph (2008): Einführung in die Erzähltextanalyse. Stuttgart: Metzler.
- Meister, Jan Christoph & Schernus, Wilhelm (eds.) (2011): Time. From Concept to Narrative Construct. A Reader. Berlin, New York: de Gruyter.
[2] HeidelTime: dbs.ifi.uni-heidelberg.de/heideltime
[3] CATMA: www.catma.de
[4] >Gius, Evelyn; Meister, Jan Christoph; Petris, Marco & Schüch, Lena (2012): “Crowdsourcing meaning: a hands-on introduction to CLÉA, the Collaborative Literature Éxploration and Annotation Environment.” In: Digital Humanities 2012. Conference Abstracts, 24.
[5] Strötgen, Jannik & Gertz, Michael (2013): “Multilingual and Cross-domain Temporal Tagging.” In: Language Resources and Evaluation, Springer.
[6] Voyant: http://voyant-tools.org/nt-tools.org/”>http://voyant-tools.org/t-tools.org/