lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From eks dev <eks...@yahoo.co.uk>
Subject Re: Edit-distance strategy (slicing and one vs. all algorithms)
Date Thu, 08 Jun 2006 19:03:10 GMT
Hi Bob, 

really nice answer!

>The real gain would be to do something like the
>edit-distance generalization of Aho-Corasick.  The
>basic idea is that instead of n iterations of string vs. string,
>you do one iteration of string vs. trie. 
 
I was experimenting a bit with ternary trie as it has some nice properties, e.g being 40%
faster than standard java or trove HashMap for exact matches,  but never got to finish path
compression and null node reduction (this way one saves huge amount of memory). Must do it
one day. 

Can you share how you implemented Trie in your App,  especialy interesting part for me is
how you go about memory consumption, have you tried really large dictionaries (1Mio+)?

thanks!




---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message