Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
4973680 | Computer Speech & Language | 2017 | 27 Pages |
Abstract
Sharing speech corpora and their annotations is desirable, in order to maximise the value gained from the expense and hard work involved in transcribing and annotating them. However, differences in conventions and format are barriers to sharing of data; text conventions conflict, file formats differ, and annotation ontologies do not match up. Using a 'pivot' form to store annotations in a tool and format neutral manner can alleviate many of these difficulties. There are several possibilities for the pivot form, including the Annotation Graph model, which meets most of the requirements to be a pivot. The LaBB-CAT software's implementation of Annotation Graphs incorporates some extensions to the model, which handle the remaining unmet requirements, and create the possibility of defining an annotation API that makes automation of conversion, querying, and manipulation of annotations easier.
Keywords
Related Topics
Physical Sciences and Engineering
Computer Science
Signal Processing
Authors
Robert Fromont,