Open Access Data

All research and development carried out in CSRI is released publicly under Creative Commons Share Alike licenses. This is part of the institute's adherence to Open Science, including open source software and open content and data. Watch this space for a growing list of our research results that we open up for (re)use by all.  


  • CMR Annotated Data Corpus (Download)
    XML COSMOROE annotation files of two TV travel series episodes, and corresponding speech to text transcripts and acoustic event identification.
  • CMR Annotation Specification File (Download)
    Sample XML file for use with the ANVIL annotation tool, following the COSMOROE annotation framework.
  • CMR Language-Visual Element Parallel Corpus (Download)
    XML files that capture pairs of language elements (words, phrases) and visual elements (objects, body movements, gestures) that are close in time and are semantically associated; the semantic association is labelled. This is annotation of two TV Travel Series Episodes.     
  • CMR Utterance-Visuals Parallel Corpus (Download)
    XML files that capture pairs of language (whole utterances) and images (objects, body movements, gestures) that co-occur in time in two TV Travel Series Videos.     
  • CMR EN-EL Parallel Corpus (Download) XML files with the parallel english-greek corpus of two TV travel series episodes, comprising translation of transcribed speech and translation of object, body movement and gesture tags.
  • CMR Labeled Object Dataset (Download)
    Object images from TV travel series, tagged with a verbal category and annotated for object contour.
  • CMR Labeled Body Movement & Gesture Dataset (Download)
    Body movements and gesture video segments from TV travel series, tagged with a verbal category. 


POETICON Lithic Tools Experiment Data (PLT)

  • PLT Audiovisual Recordings & Transcripts Set (Download)
    Audiovisual files of the actual recordings that took place during the POETICON Lithic Tool attribute and affordance elicitation cognitive experiment. The files are accompanied with the corresponding speech to text transcripts. 
  • PLT Gesture & Exploratory Action Set (upcoming)
    Video segments of gestures and exploratory actions performed by participants in the POETICON Lithic Tool Experiment Recordings, along with XML files with all related time offset annotation. It covers a variety of movement types. 
  • PLT Verbal Object Attribute and Affordance Matrix (upcoming)
    Everyday Object, attributes and affordances matrix as elicited in the POETICON Lithic Tool experiment; detailed counts and descriptive statistics are also provided.
  • PLT Argumentation Corpus (upcoming)
    A collection of arguments elicited in the POETICON Lithic Tool experiment; it comprises justifications, conditionals and analogies provided by participants using free speech. The arguments are open domain, in the sense that they refer to common sense knowledge rather than domain-specific, expert knowledge, including justification of object category given particular object attributes, conditions under which an object may be used for specific actions and so on. 
  • PLT Semantic Annotation Set (upcoming)
    XML files with semantic annotation of the verbal reports provided in the POETICON Lithic Tool Experiment; the semantic annotation comprises categorisation of language units into object, perceptual feature type and use categories following the corresponding annoation scheme developed in the framework of the experiment. 
  • PLT Semantic & Gesture Annotation Specification File (upcoming)
    XML files with the annotation schemes developed for the semantic and movement-related analysis of the POETICON Lithic Tool Experiment Data.

PRAXICON Data Downloads

  • PLT & CMR Populated PRAXICON Database (upcoming)
    Concepts and associations extracted and inferred from the COSMOROE Data Corpus and the POETICON Lithic Tools Experiment corpus, in English and Greek.