uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Richard Eckart de Castilho <eckar...@tk.informatik.tu-darmstadt.de>
Subject Re: Iterate over annotations with multiple types
Date Wed, 07 Sep 2011 13:21:33 GMT
It really depends on the data in your CAS. As far as I know, there is typically only one big
annotation index - if you get an iterator for a specific type, a filtered iterator is created
internally and returned. The only thing to speed up iteration is the offsets. If the annotations
you are looking for are more or less evenly distributed throughout your text, it's probably
faster to use a single filtered iterator than iterating separately for each type.

So far my understanding and experience. Any of the UIMA maintainers, please correct my if
I am wrong.

Cheers,

Richard

Am 07.09.2011 um 15:16 schrieb Jörn Kottmann:

> Isn't this slow? Because it then needs to iterate over every
> single AnnotationFS inside my CAS.
> 
> Jörn
> 
> 
> On 9/7/11 3:06 PM, Richard Eckart de Castilho wrote:
>> Hi Jörn,
>> 
>>> what is the best way to iterate over annotations which have
>>> different types?
>> you can use a filtered iterator - more or less like this:
>> 
>> 		CAS cas = jcas.getCas();
>> 		ConstraintFactory cf = ConstraintFactory.instance();
>> 		FSIterator<Annotation>  iterator = jcas.getAnnotationIndex().iterator();
>> 		Type tokenType = jcas.getCasType(Token.type);
>> 		Type sentenceType = jcas.getCasType(Sentence.type);
>> 
>> 		// Restrict to Tokens
>> 		FSTypeConstraint typeConstraint1 = cf.createTypeConstraint();
>> 		typeConstraint.add(tokenType);
>> 
>> 		// Restrict to Tokens
>> 		FSTypeConstraint typeConstraint2 = cf.createTypeConstraint();
>> 		typeConstraint.add(sentenceType);
>> 
>> 		// Combine both constraints using "or"
>> 		FSMatchConstraint disjunction = cf.or(typeConstraint1, typeConstraint2);
>> 
>> 		// Create and use the filtered iterator
>> 		FSIterator<Annotation>  filteredIterator = cas.createFilteredIterator(iterator,
disjunction);
>> 		while(filteredIterator.hasNext()) {
>> 			System.out.println(filteredIterator.next().getCoveredText());
>> 		}
>> 
>> Cheers,
>> 
>> Richard
>> 
> 

Richard Eckart de Castilho

-- 
------------------------------------------------------------------- 
Richard Eckart de Castilho
Technical Lead
Ubiquitous Knowledge Processing Lab 
FB 20 Computer Science Department      
Technische Universität Darmstadt 
Hochschulstr. 10, D-64289 Darmstadt, Germany 
phone [+49] (0)6151 16-7477, fax -5455, room S2/02/B117
eckartde@tk.informatik.tu-darmstadt.de 
www.ukp.tu-darmstadt.de 
Web Research at TU Darmstadt (WeRC) www.werc.tu-darmstadt.de
------------------------------------------------------------------- 





Mime
View raw message