uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Richard Eckart de Castilho <...@apache.org>
Subject Re: big offsets efficiency, and multiple offsets
Date Wed, 04 Dec 2013 14:35:31 GMT
Why is it bad if you cannot inherit from Annotation? The getCoveredText() will not work anyway
if you are working with audio/video data.

-- Richard

On 04.12.2013, at 12:31, Jens Grivolla <j+asf@grivolla.net> wrote:

> Hi, we're now starting the EUMSSI project, which deals with integrating annotation layers
coming from audio, video and text analysis.
> We're thinking to base it all on UIMA, having different views with separate audio, video,
transcribed text, etc. sofas.  In order to align the different views we need to have a common
offset specification that allows us to map e.g. character offsets to the corresponding timestamps.
> In order to avoid float timestamps (which would mean we can't derive from Annotation)
I was thinking of using audio/video frames with e.g. 100 or 1000 frames/second.  Annotation
has begin and end defined as signed 32 bit ints, leaving sufficient room for very long documents
even at 1000 fps, so I don't think we're going to run into any limits there.  Is there anything
that could become problematic when working with offsets that are probably quite a bit larger
than what is typically found with character offsets?
> Also, can I have several indexes on the same annotations in order to work with character
offsets for text analysis, but then efficiently query for overlapping annotations from other
views based on frame offsets?
> Btw, if you're interested in the project we have a writeup (condensed from the project
proposal) here: https://dl.dropboxusercontent.com/u/4169273/UIMA_EUMSSI.pdf and there will
hopefully soon be some content on http://eumssi.eu/
> Thanks,
> Jens

View raw message