uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Richard Eckart de Castilho <eckar...@tk.informatik.tu-darmstadt.de>
Subject Re: How to remove UIMA annotations?
Date Fri, 03 Jun 2011 11:31:08 GMT
Hello,

you need to collect the annotations that you want to remove in some list while you iterate
over the annotation index. Once you are done iterating, you start iterating over that list
to remove the annotations. You cannot add/remove annotations while you iterate over the annotation
index, otherwise you get the ConcurrentModificationException.

Cheers,

Richard

Am 03.06.2011 um 12:55 schrieb Rashad:

> Hi,
> 
> I'm reasonably new to UIMA and trying to get it to do what I want. I'm 
> attempting to perform entity extraction on 3 languages. I have an IF statement 
> at the start of each Analysis engine which skips if the language of the 
> document is not English for example - another AE detects the language to begin 
> with.
> 
> the next AE then tokenises this document (space tokeniser), next AE then 
> extracts entities and CAS consumer then writes this to disk.
> 
> However I don't want to write ALL the space tokenised annotations to the disk 
> aswell - only the extracted entities, as the files gets very large very 
> quickly! Once a token has been processed I want it to be removed from the CAS/
> jCAS, but token.removeFromIndexes() (I'm using Java) just throws a concurrent 
> modification exception.
> 
> How do I get around this?
> 
> This is my code:
> 
> AnnotationIndex<Annotation> token = aJCas.getAnnotationIndex(Token.type);
> 	FSIterator<Annotation> timeIter = token.iterator();
> 	while (timeIter.hasNext()) {
> 		Token currentToken = (Token) timeIter.next();
> 		Token previousToken = null;
> if (englishNamesAsTrie.search(currentToken.getToken().toLowerCase())) {
> PersonName annotation = new PersonName(aJCas);
> annotation.setBegin(currentToken.getBegin());
> annotation.setEnd(currentToken.getEnd());	
> annotation.addToIndexes(aJCas);
> currentToken.removeFromIndexes(aJCas)
> 				}
> 			}
> 		}
> 

-- 
------------------------------------------------------------------- 
Richard Eckart de Castilho
Technical Lead
Ubiquitous Knowledge Processing Lab 
FB 20 Computer Science Department      
Technische Universit├Ąt Darmstadt 
Hochschulstr. 10, D-64289 Darmstadt, Germany 
phone [+49] (0)6151 16-7477, fax -5455, room S2/02/B117
eckartde@tk.informatik.tu-darmstadt.de 
www.ukp.tu-darmstadt.de 
Web Research at TU Darmstadt (WeRC) www.werc.tu-darmstadt.de
------------------------------------------------------------------- 





Mime
View raw message