Toward a format-neutral annotation store

Article ID	Journal	Published Year	Pages	File Type
4973680	Computer Speech & Language	2017	27 Pages	PDF

Abstract

Sharing speech corpora and their annotations is desirable, in order to maximise the value gained from the expense and hard work involved in transcribing and annotating them. However, differences in conventions and format are barriers to sharing of data; text conventions conflict, file formats differ, and annotation ontologies do not match up. Using a 'pivot' form to store annotations in a tool and format neutral manner can alleviate many of these difficulties. There are several possibilities for the pivot form, including the Annotation Graph model, which meets most of the requirements to be a pivot. The LaBB-CAT software's implementation of Annotation Graphs incorporates some extensions to the model, which handle the remaining unmet requirements, and create the possibility of defining an annotation API that makes automation of conversion, querying, and manipulation of annotations easier.

Keywords

Speech corpora Interoperability