uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rashad <rash....@gmail.com>
Subject How to remove UIMA annotations?
Date Fri, 03 Jun 2011 10:55:32 GMT
Hi,

I'm reasonably new to UIMA and trying to get it to do what I want. I'm 
attempting to perform entity extraction on 3 languages. I have an IF statement 
at the start of each Analysis engine which skips if the language of the 
document is not English for example - another AE detects the language to begin 
with.

the next AE then tokenises this document (space tokeniser), next AE then 
extracts entities and CAS consumer then writes this to disk.

However I don't want to write ALL the space tokenised annotations to the disk 
aswell - only the extracted entities, as the files gets very large very 
quickly! Once a token has been processed I want it to be removed from the CAS/
jCAS, but token.removeFromIndexes() (I'm using Java) just throws a concurrent 
modification exception.

How do I get around this?

This is my code:

AnnotationIndex<Annotation> token = aJCas.getAnnotationIndex(Token.type);
	FSIterator<Annotation> timeIter = token.iterator();
	while (timeIter.hasNext()) {
		Token currentToken = (Token) timeIter.next();
		Token previousToken = null;
if (englishNamesAsTrie.search(currentToken.getToken().toLowerCase())) {
PersonName annotation = new PersonName(aJCas);
annotation.setBegin(currentToken.getBegin());
annotation.setEnd(currentToken.getEnd());	
annotation.addToIndexes(aJCas);
currentToken.removeFromIndexes(aJCas)
				}
			}
		}


Mime
View raw message