COSMOROE Language-Visual Element Parallel Corpus

TitleCOSMOROE Language-Visual Element Parallel Corpus
Publication TypeAudiovisual
Year of Publication2015
AuthorsPastra, K, Balta, E, Dimitrakis, P
EditionVersion 1.0
PublisherCognitive Systems Research Institute
TypeAnnotated Audiovisual Dataset
Other NumbersISLRN: 768-898-765-797-3
Keywordscross media semantics, image-language, multimedia, vision-language integration

XML files that capture pairs of language elements (words, phrases) and visual elements (objects, body movements, gestures) that are close in time and are semantically associated; the semantic association is labelled. This is annotation of two TV Travel Series Episodes which follows the COSMOROE multimedia dialectics framework. The annotation comprises: (a) Direct associations between language elements (words, phrases in speech or graphic/scene text) and visual elements (objects, body movements or gestures; the language elements are provided with language context (full utterance) in which they appear in the files, while the visual elements are provided through tags attributed to their content, and links to associated image/video segment files; in the case of objects, links to original images and images with overlaid object contours are provided, while in the case of movements, movement complements (agent, tool, affected object, action location) are provided as well as gesture type for gestures. Objects are tagged with an object category denoted label, while movements are tagged with a goal-denoting label. All language elements, language context and labels are available in English and Greek. (b) Indirect associations between language and visual elements. These are associations that go through a number of inferred relations. semantic association path that goes through inferred concepts, not present in the multimodal discourse). In these cases, the indirect association comprises a chain of inferred relations. The first and last inferred relations in the chain comprise the verbal or visual argument (respectively) and an inferred conceptual argument (i.e. a concept that is not present in the multimedia discourse). Any in-between inferred relations comprise only conceptual arguments. Tail recursion characterises the inferred relations and this is how a chain is formulated, i.e. the argument of one inferred relation is shared with the next inferred relation; conceptual arguments drive this process. Thus, in case of indirect associations, two physically present and close in time elements (language and visual element) get associated through inferred relations to concepts (which could have been realised physically through any modality). The pairs can be searched through the online COSMOROE search engine: This corpus and associated ones are available for download with detailed readme files at the following URL.

Short TitleCMR Language-Visual Element Parallel Corpus
Citation KeyPastra2015