lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Spencer <dave-lucene-u...@tropo.com>
Subject Re: combining open office spellchecker with Lucene
Date Thu, 09 Sep 2004 16:01:12 GMT
Aad Nales wrote:

> Hi All,
> 
> Before I start reinventing wheels I would like to do a short check to
> see if anybody else has already tried this. A customer has requested us
> to look into the possibility to perform a spell check on queries. So far
> the most promising way of doing this seems to be to create an Analyzer
> based on the spellchecker of OpenOffice. My question is: "has anybody
> tried this before?" 

I did a WordNet/synonym query expander. Search for "WordNet" on this 
page. Of interest is it stores the Wordnet info in a separate Lucene 
index as at its essence all an index is is a database.

http://jakarta.apache.org/lucene/docs/lucene-sandbox/

Also, another variation, is to instead spell based on what terms are in 
the index, not what an external dictionary says. I've done this on my 
experimental site searchmorph.com in a dumb/inefficient way. Here's an 
example:

http://www.searchmorph.com/kat/search.jsp?s=recursivz

After you click above it takes ~10sec as it produces terms close to 
"recursivz". Opps - looking at the output, it looks like the same word 
is suggest multiple times - ouch - I must be considering all fields, not 
just the contents field. TBD is fixing this. (or no wonder it's so slow :))

I can/should send the code out. The logic is that for any terms in a 
query that have zero matches, go thru all the terms(!) and calculate the 
Levenshtein string distance, and return the best matches. A more 
intelligent way of doing this is to instead look for terms that also 
match on the 1st "n" (prob 3) chars.




> 
> Cheers,
> Aad
> 
> 
> --
> Aad Nales
> aad.nales@rotterdam-cs.com, +31-(0)6 54 207 340 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message