lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tom White <>
Subject Re: Did you mean?
Date Tue, 30 Aug 2005 08:14:47 GMT
On 8/29/05, Chris Lu <> wrote:
> Two approaches I can think of:
> * Use a word list(it may not be the word list you want, but it is just
> a compromise).
> * Analyze your original index, listing out all words inside.
Using a word list suffers from two problems:
1. (Coverage) No word list is complete, they need to be maintained as new 
words are coined, and word lists for different languages vary in quality.
2. (Useless suggestions) There is little point in making a suggestion for 
words that aren't in the original index (as they wouldn't produce any hits).

For these reasons it is better to use the original index as a source of 
words. It is true that the index will likely contain spelling errors, 
however Lucene Spell Checker provides a way to restrict suggestions to words 
that are more popular than the query term. As misspellings are typically 
rarer than correct spellings this should ensure that misspelled suggestions 
are almost never made. The article quoted above (which I wrote), provides a 
bit more discussion.


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message