uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andreas Niekler <aniek...@informatik.uni-leipzig.de>
Subject ConceptMApper
Date Wed, 20 Mar 2013 11:09:04 GMT
Hello,

i try to use the ConceptMapper to annotate Multi Word Units in german. I
face the problem that all the tokens within the dictionary are matched
somehow like.

In the dict -> FC Barcelona

Annotated in a Text "The FC scored today" FC is annotated as DictEntry

Why does conceptMapper annotate this. Here are my Parameters

AnalysisEngineDescription mapper =
AnalysisEngineFactory.createPrimitiveDescription(
				ConceptMapper.class,
				ts,
				ConceptMapper.PARAM_ANNOTATION_NAME,
"org.apache.uima.conceptMapper.DictTerm",
	    		ConceptMapper.PARAM_ENCLOSINGSPAN, "enclosingSpan",
	    		ConceptMapper.PARAM_TOKENANNOTATION, "opennlp.uima.Token",
	    		ConceptMapper.PARAM_ATTRIBUTE_LIST, new String[] {"canonical"},
	    		ConceptMapper.PARAM_FEATURE_LIST, new String[] {"DictCanon"},	    		
	    		ConceptMapper.PARAM_MATCHEDFEATURE, "matchedText",
	    		ConceptMapper.PARAM_TOKENIZERDESCRIPTOR, "TokenizerDE.xml",
	    		//ConceptMapper.PARAM_DATA_BLOCK_FS, "uima.tcas.DocumentAnnotation",
	    		ConceptMapper.PARAM_DATA_BLOCK_FS, "opennlp.uima.Sentence",
	    		ConceptMapper.PARAM_SEARCHSTRATEGY, "ContiguousMatch",
	    		ConceptMapper.PARAM_MATCHEDTOKENSFEATURENAME, "matchedTokens",
	    		TokenNormalizer.PARAM_CASE_MATCH, "ignoreall");

Thank you

Andreas

Mime
View raw message