uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jens Grivolla <j+...@grivolla.net>
Subject Re: big offsets efficiency, and multiple offsets
Date Thu, 05 Dec 2013 09:18:28 GMT
I forgot to say that the text analysis view(s) will necessarily have to 
use character offsets so that we can obtain the coveredText, which means 
that all resulting annotations will also use character offsets.  The 
merged view will need to use time-based offsets which means that we have 
to recreate the annotations there with mapped offsets rather than just 
index the same annotations in a different view.

I think that basically means that we won't do much cross-view querying 
but rather have one component (AE) that reads from all views and creates 
a new one with new independent annotations after mapping the offsets.

-- Jens

On 05/12/13 10:04, Jens Grivolla wrote:
> I agree that it might make more sense to model our needs more directly
> instead of trying to squeeze it into the schema we normally use for text
> processing.  But at the same time I would of course like to avoid having
> to reimplement many of the things that are already available when using
> AnnotationBase.
> For the cross-view indexing issue I was thinking of creating individual
> views for each modality and then a merged view that just contains a
> subset of annotations of each view, and on which we would do the
> cross-modal reasoning.
> I just looked again at the GaleMultiModalExample (not much there,
> unfortunately) and saw that e.g. AudioSpan derives from AnnotationBase
> but still has float values for begin/end.  I would be really interested
> in learning more about what was done in GALE, but it's hard to find any
> relevant information...
> Thanks,
> Jens
> On 04/12/13 20:16, Marshall Schor wrote:
>> Echoing Richard,
>> 1) It would perhaps make more sense to be more direct about each of the
>> different types of data.  UIMA "built-in" only the most "popular"
>> things - and
>> Annotation was one of them.
>> Annotation derives from Annotation-base, which just defines an
>> associated Sofa /
>> view.
>> So it would make more sense to define different kinds of highest-level
>> abstractions for your project, related to the different kinds of
>> views/sofas.
>> Audio might entail a begin / end style of offsets;  Images might
>> entail a pair
>> x-y coordinates, to describe a (square) subset of an image.  Video
>> might do
>> something like audio, or something more complex...
>> UIMA's use of the AnnotationBase includes insuring that when you
>> add-to-indexes
>> (an operation that implicitly takes a "view" - and adds a FS to that
>> view), that
>> if the FS is a subtype of AnnotationBase, then the FS must be indexed
>> in the
>> associated view to which that FS "belongs"; if you try to add-to-index
>> in a view
>> other than the one the FS was created in, you get this kind of error:
>> Error - the Annotation "{0}" is over view "{1}" and cannot be added to
>> indexes
>> associated with the different view "{2}".
>> The logic behind this restriction is:  an Annotation (or, more
>> generally, an
>> object having a supertype of AnnotationBase) is (by definition)
>> associated with
>> a particular Sofa/View,  and it is more likely that it is an error if
>> that
>> annotation is indexed with a sofa it doesn't belong with.
>> Of course, Feature Structures which are not Annotations (or more
>> generally, not
>> derived from AnnotationBase), can be indexed in multiple views.
>> 2) By keeping separate notions for pointers-into-the-Sofa, you can define
>> algorithmic mappings for these that make the best sense for your project,
>> including notions of fuzzyness, time-shift (imagine the audio is
>> out-of-sync
>> with the video, like lots of u-tube things seem to be), etc.
>> -Marshall
>> On 12/4/2013 9:31 AM, Jens Grivolla wrote:
>>> Hi, we're now starting the EUMSSI project, which deals with integrating
>>> annotation layers coming from audio, video and text analysis.
>>> We're thinking to base it all on UIMA, having different views with
>>> separate
>>> audio, video, transcribed text, etc. sofas.  In order to align the
>>> different
>>> views we need to have a common offset specification that allows us to
>>> map e.g.
>>> character offsets to the corresponding timestamps.
>>> In order to avoid float timestamps (which would mean we can't derive
>>> from
>>> Annotation) I was thinking of using audio/video frames with e.g. 100
>>> or 1000
>>> frames/second.  Annotation has begin and end defined as signed 32 bit
>>> ints,
>>> leaving sufficient room for very long documents even at 1000 fps, so
>>> I don't
>>> think we're going to run into any limits there.  Is there anything
>>> that could
>>> become problematic when working with offsets that are probably quite
>>> a bit
>>> larger than what is typically found with character offsets?
>>> Also, can I have several indexes on the same annotations in order to
>>> work with
>>> character offsets for text analysis, but then efficiently query for
>>> overlapping annotations from other views based on frame offsets?
>>> Btw, if you're interested in the project we have a writeup (condensed
>>> from the
>>> project proposal) here:
>>> https://dl.dropboxusercontent.com/u/4169273/UIMA_EUMSSI.pdf and there
>>> will
>>> hopefully soon be some content on http://eumssi.eu/
>>> Thanks,
>>> Jens

View raw message