uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Tanenblatt <sloth...@park-slope.net>
Subject Re: ConceptMApper
Date Wed, 20 Mar 2013 11:26:04 GMT
I have never seen this issue--under no circumstances should anything less than the full dictionary
entry be matched. The only things I can think of are either errors in the dictionary, though
that's unlikely, or issues with the tokenizer. Or a bug… My guess is that the dictionary
entry, "FC Barcelona" is being tokenized such that only "FC" is annotated, therefore that
is the only part that needs to match. You can test if it is a tokenization issue by using
the sample whitespace tokenizer that comes with ConceptMapper just to test and see what results
you get.

On Mar 20, 2013, at 7:09 AM, Andreas Niekler <aniekler@informatik.uni-leipzig.de> wrote:

> Hello,
> i try to use the ConceptMapper to annotate Multi Word Units in german. I
> face the problem that all the tokens within the dictionary are matched
> somehow like.
> In the dict -> FC Barcelona
> Annotated in a Text "The FC scored today" FC is annotated as DictEntry
> Why does conceptMapper annotate this. Here are my Parameters
> AnalysisEngineDescription mapper =
> AnalysisEngineFactory.createPrimitiveDescription(
> 				ConceptMapper.class,
> 				ts,
> 				ConceptMapper.PARAM_ANNOTATION_NAME,
> "org.apache.uima.conceptMapper.DictTerm",
> 	    		ConceptMapper.PARAM_ENCLOSINGSPAN, "enclosingSpan",
> 	    		ConceptMapper.PARAM_TOKENANNOTATION, "opennlp.uima.Token",
> 	    		ConceptMapper.PARAM_ATTRIBUTE_LIST, new String[] {"canonical"},
> 	    		ConceptMapper.PARAM_FEATURE_LIST, new String[] {"DictCanon"},	    		
> 	    		ConceptMapper.PARAM_MATCHEDFEATURE, "matchedText",
> 	    		ConceptMapper.PARAM_TOKENIZERDESCRIPTOR, "TokenizerDE.xml",
> 	    		//ConceptMapper.PARAM_DATA_BLOCK_FS, "uima.tcas.DocumentAnnotation",
> 	    		ConceptMapper.PARAM_DATA_BLOCK_FS, "opennlp.uima.Sentence",
> 	    		ConceptMapper.PARAM_SEARCHSTRATEGY, "ContiguousMatch",
> 	    		ConceptMapper.PARAM_MATCHEDTOKENSFEATURENAME, "matchedTokens",
> 	    		TokenNormalizer.PARAM_CASE_MATCH, "ignoreall");
> Thank you
> Andreas

View raw message