uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Tanenblatt <sloth...@park-slope.net>
Subject Re: ConceptMapper and DictionaryAnnotator: what are the differences?
Date Mon, 21 Sep 2009 11:36:40 GMT
I am the author of ConceptMapper. It is our intention, as time  
permits, to merge both of these two projects. Off the top of my head,  
I can't speak to the functionality of DictionaryAnnotator, as our  
discussions about the two systems occurred quite some time ago, so I  
will just give you a summary of the features of ConceptMapper and  
someone else can supply the details of DictionaryAnnotator. Clearly,  
if you are confused about the differences, we should probably make  
these differences clearer on the sandbox site.

- ConceptMapper (CM) provides token-based dictionary lookup
- A tokenizer's AE descriptor is supplied as a parameter to CM to be  
used for tokenizing its dictionary, thereby assuring that the  
dictionary is tokenized in the same way as the input document
- multi-token terms are allowed
- Any number of synonyms can be associated with an entry
- numerous lookup strategies are supported, providing for simple  
contiguous-token lookup, or allowing intervening tokens to be skipped  
between tokens that make up a multi-token term. This skipping can be  
controlled, skipping only tokens with certain feature values, or  
uncontrolled, skipping any.
- In addition to the other mechanisms for token skipping, you can  
supply a list of stop words to ignore during matching
- The XML-based dictionary can have any arbitrary set of features  
associated with an entry, and any (or all) of those features can be  
mapped to specific features of the resultant annotation. These can be  
associated at the granularity of individual synonyms, or with an  
entire entry. Synonym-specific features will override those specified  
for the dictionary entry if they have the same feature name.
- Lookup can also be set to allow out-of-order token lookup, thereby  
allowing {A} {B} {C} to match {C} {A} {B}
- Result can be longest match, or all entries that match against a  
token or set of tokens
- Can specify features from the dictionary can be written back to  
matching tokens
- Can match against tokens' covered text, or specify a token specific  
feature to match against
- A stemmer can be applied to tokens before matching is performed

I think that covers everything. I hope this helps!

On Sep 21, 2009, at 6:59 AM, Roberto Franchini wrote:

> Hi to all,
> I'm exploring the ConceptMapper and the Dicnionaryannotar from the
> sandbox and I can't see  very big differences in the porpuse.
> Maybe the creators(donatos) are going to merge this two projects, am  
> I right?
> But, at this time, what's the best choice? And what's the best of two?
> Regards,
> R.
> -- 
> Roberto Franchini
> http://www.celi.it
> http://www.blogmeter.it
> http://www.memesphere.it
> Tel +39-011-6600814
> jabber:ro.franchini@gmail.com skype:ro.franchini

View raw message