lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doron Cohen <DOR...@il.ibm.com>
Subject Re: SpellChecker performance and usage
Date Tue, 04 Dec 2007 05:59:44 GMT
I didn't have performance issues when using the spell checker.
Can you describe what you tried and how long it took, so
people can relate to that.

AFAIK the spell checker in o.a.l.search.spell does not "expand
a query by adding all the permutations of potentially misspelled
word". It is based on building an auxiliary index whose *documents*
are *words* of the main index, going through n-gram tokenization.
A checked word is tokenized that way too, and used as a query on.
the auxiliary index.

There's more wisdom in the query tokenization,
but a simplifying example an help to see how it works:
- a misspelled word 'helo' is tokenized as 'he el lo',
- the auxiliary index contains a document for the correct
  word "hello" that was tokenized as 'he el ll lo'
- the score of the document 'hello' would be high when searching
  the auxiliary index for 'he el lo'.

The only performance hit is when refreshing/rebuilding the
auxiliary index after the lexicon of the actual index
has changed a lot. But this can be done in the background when
adequate for the application using Lucene and the spell checker.

Doron

smokey <smokeystu@gmail.com> wrote on 03/12/2007 17:23:21:

> My question is for anyone who has experience with Lucene's SpellChecker,
> especially around its performance characteristics/ramifications.
>
> 1. Given the fact that SpellChecker expands a query by adding all the
> permutations of potentially misspelled word, how does it
> perform in general?
>
> 2. How are others handling the case where SpellChecker would NOT perform
> well if you expand the query adding all the permutations? In other words,
> what kind of techniques are people using to get around or alleviate the
> performance hit if any?
>
> Any sharing of information or pointers would be appreciated.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message