Hello,
i try to use the ConceptMapper to annotate Multi Word Units in german. I
face the problem that all the tokens within the dictionary are matched
somehow like.
In the dict -> FC Barcelona
Annotated in a Text "The FC scored today" FC is annotated as DictEntry
Why does conceptMapper annotate this. Here are my Parameters
AnalysisEngineDescription mapper =
AnalysisEngineFactory.createPrimitiveDescription(
ConceptMapper.class,
ts,
ConceptMapper.PARAM_ANNOTATION_NAME,
"org.apache.uima.conceptMapper.DictTerm",
ConceptMapper.PARAM_ENCLOSINGSPAN, "enclosingSpan",
ConceptMapper.PARAM_TOKENANNOTATION, "opennlp.uima.Token",
ConceptMapper.PARAM_ATTRIBUTE_LIST, new String[] {"canonical"},
ConceptMapper.PARAM_FEATURE_LIST, new String[] {"DictCanon"},
ConceptMapper.PARAM_MATCHEDFEATURE, "matchedText",
ConceptMapper.PARAM_TOKENIZERDESCRIPTOR, "TokenizerDE.xml",
//ConceptMapper.PARAM_DATA_BLOCK_FS, "uima.tcas.DocumentAnnotation",
ConceptMapper.PARAM_DATA_BLOCK_FS, "opennlp.uima.Sentence",
ConceptMapper.PARAM_SEARCHSTRATEGY, "ContiguousMatch",
ConceptMapper.PARAM_MATCHEDTOKENSFEATURENAME, "matchedTokens",
TokenNormalizer.PARAM_CASE_MATCH, "ignoreall");
Thank you
Andreas
|