uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From William Colen <william.co...@gmail.com>
Subject Re: Sorting overlapping annotation of same type using UIMAFIT
Date Fri, 25 Nov 2016 22:48:15 GMT
Great! Thank you!


2016-11-23 12:33 GMT-02:00 Marshall Schor <msa@schor.com>:

> UIMA allows you to define custom indexes.  So you can define a new sorted
> index
> (for example, let's name it "nameOfYourNewIndex") that is like the
> annotator
> index, except that its keys are a) the begin feature, ascending, 2) the end
> feature, descending, and 3) the special extra feature you have to sort
> otherwise
> equal annotations.  You would define this index to be over the most
> specific
> type that is the type or supertype of all Feature Structures you want this
> index
> to apply to (let's say you have a JCas class for this, called
> JCasClassOfTheType).
>
> Then you can use uimaFIT's your own index (see docs), that include your
> extra
> feature.  Then you would use a form such as this:
>
> // get the index instance from the JCas
> FSIndex<JCasClassOfTheType> index = jcas.getIndex("nameOfYourNewIndex",
> JCasClassOfTheType);
>
> // get an iterator from the index
> FSIterator<JCasClassOfTheType> iterator = index.iterator();
>
> With this, there is no need to have the user first collect all the
> instances,
> and then sort them; UIMA does this for you.
>
> Hope this helps!  -Marshall
>
>
> On 11/21/2016 8:05 PM, William Colen wrote:
> > Thank you, Marshall.
> > What if they are of the same type?
> > The workaround for me was to add a feature I can store a integer which I
> > use to sort the annotations. It is not a good approach because the user
> > will need to remember to sort it before using.
> >
> > Thank you
> > William
> >
> > 2016-11-21 20:10 GMT-02:00 Marshall Schor <msa@schor.com>:
> >
> >> The select form you're using iterates using UIMA's built-in Annotation
> >> index.
> >> This index is sorting the annotations based on 3 criteria:
> >>
> >> 1) the begin (ascending order)
> >>
> >> 2) the end (descending order)
> >>
> >> 3) the type priority
> >>
> >> You can use the 3rd criterion to set a preference ordering among two
> >> annotations
> >> of different types, which have the same begin / end.
> >> You specify the type priorities as part of Analysis Engine metadata, see
> >> http://uima.apache.org/d/uimaj-current/references.html#
> >> ugr.ref.xml.component_descriptor.aes.primitive
> >>
> >> -Marshall
> >>
> >> On 11/20/2016 9:52 PM, William Colen wrote:
> >>> Hi,
> >>>
> >>> In Portuguese we have contractions, that are words composed by, for
> >>> example, a preposition + article, pronoun or an adverb.
> >>>
> >>> Example:
> >>>
> >>> Nós acreditávamos nele. (We believed him.)
> >>>
> >>> Where "nele" can be divided into "em" + "ele". (in + him)
> >>>
> >>> To properly analyze this, I created two token annotation with the same
> >>> begin and end, but the first I associated with the POS Tag preposition,
> >> and
> >>> the second pronoun.
> >>>
> >>> This is especially important when we are doing chunking, because the
> >> first
> >>> token will be part of a prepositional phrase, while the second of a
> >> nominal
> >>> phrase.
> >>>
> >>> How can I guarantee that when I call UIMAFit JCasUtil.select I will get
> >> the
> >>> tokens ordered, first the preposition, second the pronoun?
> >>>
> >>> Thank you,
> >>> William
> >>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message