uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christian Mauceri <mauc...@hermeneute.com>
Subject Re: Iterators in CAS
Date Sat, 13 Oct 2007 23:41:29 GMT
Hi Katja,

I'm afraid my answer comes too late but the nice thinh is precisely 
that. In the chunk of code I sent you it is what I wanted to show:
You iter over all the annotator if you detect a np all the tokens you 
are going to detect after will belong to this np till you reach another 
one. The iterators in UIMA are already structured according to the 
position but also to the hierarchy, so if you iter over the generic 
iterator and test the class its elements belongs to you are sure first 
to detect fisrt np (0,10) in your example and the other elements will be 
of type token till you find another np. If your np are not contiguous 
you always have the possibility to check token.end <= np.end but anyway 
the order is guaranteed.

Ekaterina Buyko wrote:
> Hi Christian,
>
> Thank you very much.
>
> What I had orinally in mind would be a method in UIMA such as:
> Sentence [] sentence = token.getOverlapAnnotation (Sentence.type);
>
> But I have still some questions to your proposal:
>
> If you get an iterator over all annotations, it is ok.
> Do you know what is the order the annotations are in?
>
> If I have for example the annotations (numbers are respective begin 
> and end)
> NP np (0,10)
> Token token1(0,5), token2(6, 10)
>
> Then I get index. How are they ordered?
> np, token1, token2?
>
> And what will be if they have the same span?
> NP np (0,5)
> Token token1(0,5)
>
> With best regards
>
> Katja
>
>
>
> Christian Mauceri schrieb:
>> Hi Ekaterina,
>>
>> if I understood your question, it is possible and even a nice feature 
>> of UIMA. I have more or less the same problems, I have two types of 
>> annotations contexts and forms (sentences and token for you). So I 
>> have TAEs which marks contexts and forms then I have another TAE (a 
>> CAS consumer in my very simple case) which do the following.:
>>
>>       // A context
>>        TCollocation tc = null;
>>       // A form
>>        TForm f = null;
>>
>>       // I first iter over all the annotations
>>        Iterator annot = 
>> jcas.getJFSIndexRepository().getAnnotationIndex().iterator();
>>        while(annot.hasNext()) {
>>            Annotation a = (Annotation)annot.next();
>>             // then I test if it is a context TCollocation or a form 
>> TForm
>>            if (a instanceof TCollocation) {
>>                tc = (TCollocation)a;
>>                //System.out.println(tc.getMatch());
>>            } else if (a instanceof TForm) {
>>                f = (TForm) a;
>>            }
>>        }
>>
>> That's all the nice thing is that the iterator respects the position 
>> order in the text and the inclusion hierarchy so you are sure the 
>> current form belongs to the current context.
>>
>> I hope it is helpfull and I did not say baloneys, at least works fine 
>> for me.
>>
>> Regards.
>> Christian.
>>
>>
>> Ekaterina Buyko wrote:
>>> Hi all!
>>>
>>> In UIMA 2.1 it is possible to create a sub-iterator in order to 
>>> iterate over annotations which are within the begin-end span of the 
>>> selected type.
>>>
>>> For example:
>>>
>>> AnnotationIndex sentenceIndex = (AnnotationIndex) aJCas 
>>> .getJFSIndexRepository().getAnnotationIndex(Sentence.type);
>>>
>>> AnnotationIndex tokenIndex = (AnnotationIndex) aJCas
>>>                .getJFSIndexRepository().getAnnotationIndex(Token.type);
>>>
>>>        // iterate over Sentences
>>>        FSIterator sentenceIterator = sentenceIndex.iterator();
>>>        while (sentenceIterator.hasNext()) {
>>>
>>>            Sentence sentence = (Sentence) sentenceIterator.next();
>>>
>>>            // iterate over Tokens
>>>            FSIterator tokenIterator = tokenIndex.subiterator(sentence);
>>>
>>>
>>> I would like to have a more extended functionality. I need to know 
>>> the annotations which are in the span of begin-end of the selected 
>>> annotation type. These annotations can overlap the span of the 
>>> selected type.
>>>
>>> For example noun phrases. If I iterate over tokens, I would like to 
>>> know, if this token is inside a noun phrase or not. Now, I am 
>>> working with Hashtables. But I am looking for an other solution.
>>>
>>> How could I solve this problem?
>>>
>>> Bets regards
>>>
>>> Ekaterina
>>>
>>>
>>>
>>>
>>
>
>
>

-- 
Cordialement/Regards
Christian Mauceri
http://hermeneute.com/Christian


Mime
View raw message