uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marshall Schor <...@schor.com>
Subject Re: Sorting overlapping annotation of same type using UIMAFIT
Date Wed, 23 Nov 2016 14:33:04 GMT
UIMA allows you to define custom indexes.  So you can define a new sorted index
(for example, let's name it "nameOfYourNewIndex") that is like the annotator
index, except that its keys are a) the begin feature, ascending, 2) the end
feature, descending, and 3) the special extra feature you have to sort otherwise
equal annotations.  You would define this index to be over the most specific
type that is the type or supertype of all Feature Structures you want this index
to apply to (let's say you have a JCas class for this, called JCasClassOfTheType).

Then you can use uimaFIT's your own index (see docs), that include your extra
feature.  Then you would use a form such as this:

// get the index instance from the JCas
FSIndex<JCasClassOfTheType> index = jcas.getIndex("nameOfYourNewIndex",
JCasClassOfTheType);

// get an iterator from the index
FSIterator<JCasClassOfTheType> iterator = index.iterator();

With this, there is no need to have the user first collect all the instances,
and then sort them; UIMA does this for you.

Hope this helps!  -Marshall


On 11/21/2016 8:05 PM, William Colen wrote:
> Thank you, Marshall.
> What if they are of the same type?
> The workaround for me was to add a feature I can store a integer which I
> use to sort the annotations. It is not a good approach because the user
> will need to remember to sort it before using.
>
> Thank you
> William
>
> 2016-11-21 20:10 GMT-02:00 Marshall Schor <msa@schor.com>:
>
>> The select form you're using iterates using UIMA's built-in Annotation
>> index.
>> This index is sorting the annotations based on 3 criteria:
>>
>> 1) the begin (ascending order)
>>
>> 2) the end (descending order)
>>
>> 3) the type priority
>>
>> You can use the 3rd criterion to set a preference ordering among two
>> annotations
>> of different types, which have the same begin / end.
>> You specify the type priorities as part of Analysis Engine metadata, see
>> http://uima.apache.org/d/uimaj-current/references.html#
>> ugr.ref.xml.component_descriptor.aes.primitive
>>
>> -Marshall
>>
>> On 11/20/2016 9:52 PM, William Colen wrote:
>>> Hi,
>>>
>>> In Portuguese we have contractions, that are words composed by, for
>>> example, a preposition + article, pronoun or an adverb.
>>>
>>> Example:
>>>
>>> Nós acreditávamos nele. (We believed him.)
>>>
>>> Where "nele" can be divided into "em" + "ele". (in + him)
>>>
>>> To properly analyze this, I created two token annotation with the same
>>> begin and end, but the first I associated with the POS Tag preposition,
>> and
>>> the second pronoun.
>>>
>>> This is especially important when we are doing chunking, because the
>> first
>>> token will be part of a prepositional phrase, while the second of a
>> nominal
>>> phrase.
>>>
>>> How can I guarantee that when I call UIMAFit JCasUtil.select I will get
>> the
>>> tokens ordered, first the preposition, second the pronoun?
>>>
>>> Thank you,
>>> William
>>>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message