uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thilo Götz <twgo...@gmx.de>
Subject Re: Iterate over annotations with multiple types
Date Wed, 07 Sep 2011 14:56:53 GMT
On 07/09/11 15:21, Richard Eckart de Castilho wrote:
> It really depends on the data in your CAS. As far as I know, there is typically only
one big annotation index - if you get an iterator for a specific type, a filtered iterator
is created internally and returned. The only thing to speed up iteration is the offsets. If
the annotations you are looking for are more or less evenly distributed throughout your text,
it's probably faster to use a single filtered iterator than iterating separately for each
type.
> 
> So far my understanding and experience. Any of the UIMA maintainers, please correct my
if I am wrong.

Correcting.  It's actually the other way round.  There's an index for
each annotation type, and if you iterate over all annotations, the
iterators are merged at runtime.

If speed is of the essence, it's best to create an iterator
for each of the annotation types you're interested in, and
do the weaving manually.  Having said that, iterating in general
is quite fast, and unless your operations are really cheap, this
is not likely to by you a lot.

--Thilo

> 
> Cheers,
> 
> Richard
> 
> Am 07.09.2011 um 15:16 schrieb Jörn Kottmann:
> 
>> Isn't this slow? Because it then needs to iterate over every
>> single AnnotationFS inside my CAS.
>>
>> Jörn
>>
>>
>> On 9/7/11 3:06 PM, Richard Eckart de Castilho wrote:
>>> Hi Jörn,
>>>
>>>> what is the best way to iterate over annotations which have
>>>> different types?
>>> you can use a filtered iterator - more or less like this:
>>>
>>> 		CAS cas = jcas.getCas();
>>> 		ConstraintFactory cf = ConstraintFactory.instance();
>>> 		FSIterator<Annotation>  iterator = jcas.getAnnotationIndex().iterator();
>>> 		Type tokenType = jcas.getCasType(Token.type);
>>> 		Type sentenceType = jcas.getCasType(Sentence.type);
>>>
>>> 		// Restrict to Tokens
>>> 		FSTypeConstraint typeConstraint1 = cf.createTypeConstraint();
>>> 		typeConstraint.add(tokenType);
>>>
>>> 		// Restrict to Tokens
>>> 		FSTypeConstraint typeConstraint2 = cf.createTypeConstraint();
>>> 		typeConstraint.add(sentenceType);
>>>
>>> 		// Combine both constraints using "or"
>>> 		FSMatchConstraint disjunction = cf.or(typeConstraint1, typeConstraint2);
>>>
>>> 		// Create and use the filtered iterator
>>> 		FSIterator<Annotation>  filteredIterator = cas.createFilteredIterator(iterator,
disjunction);
>>> 		while(filteredIterator.hasNext()) {
>>> 			System.out.println(filteredIterator.next().getCoveredText());
>>> 		}
>>>
>>> Cheers,
>>>
>>> Richard
>>>
>>
> 
> Richard Eckart de Castilho
> 

Mime
View raw message