uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eddie Epstein <eaepst...@gmail.com>
Subject Re: Restrictions on sofa data array
Date Tue, 27 Apr 2010 19:58:05 GMT
On Tue, Apr 27, 2010 at 10:59 AM, Thilo Goetz <twgoetz@gmx.de> wrote:
> My understanding is that he wants the tokens as primitives,
> not the characters.  Annotation offsets could then be token
> offsets, not character offsets.  That's perfectly reasonable
> for some tasks.  We usually create annotations with the start
> offset being the start of some token, and the end offset the
> end of some token.  Then it's hard to find the tokens that
> are "covered" by the annotation, which is why we have
> subiterators, which are not super efficient.  And so on.
> I like the idea, but I have no idea how compatible it is with
> UIMA's idea of views and sofas.

A StringArrayFS can be used as Sofa data. Moreover, a new
annotation type derived from AnnotationBase can be used
to point into the StringArray, and if using JCas it could have
a getCoveredText() method or other functional capabilities.

Thanks for explaining the scenario!


View raw message