uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From William Colen <william.co...@gmail.com>
Subject Re: Sorting overlapping annotation of same type using UIMAFIT
Date Tue, 22 Nov 2016 01:05:02 GMT
Thank you, Marshall.
What if they are of the same type?
The workaround for me was to add a feature I can store a integer which I
use to sort the annotations. It is not a good approach because the user
will need to remember to sort it before using.

Thank you
William

2016-11-21 20:10 GMT-02:00 Marshall Schor <msa@schor.com>:

> The select form you're using iterates using UIMA's built-in Annotation
> index.
> This index is sorting the annotations based on 3 criteria:
>
> 1) the begin (ascending order)
>
> 2) the end (descending order)
>
> 3) the type priority
>
> You can use the 3rd criterion to set a preference ordering among two
> annotations
> of different types, which have the same begin / end.
> You specify the type priorities as part of Analysis Engine metadata, see
> http://uima.apache.org/d/uimaj-current/references.html#
> ugr.ref.xml.component_descriptor.aes.primitive
>
> -Marshall
>
> On 11/20/2016 9:52 PM, William Colen wrote:
> > Hi,
> >
> > In Portuguese we have contractions, that are words composed by, for
> > example, a preposition + article, pronoun or an adverb.
> >
> > Example:
> >
> > Nós acreditávamos nele. (We believed him.)
> >
> > Where "nele" can be divided into "em" + "ele". (in + him)
> >
> > To properly analyze this, I created two token annotation with the same
> > begin and end, but the first I associated with the POS Tag preposition,
> and
> > the second pronoun.
> >
> > This is especially important when we are doing chunking, because the
> first
> > token will be part of a prepositional phrase, while the second of a
> nominal
> > phrase.
> >
> > How can I guarantee that when I call UIMAFit JCasUtil.select I will get
> the
> > tokens ordered, first the preposition, second the pronoun?
> >
> > Thank you,
> > William
> >
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message