lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Muir <>
Subject Re: Spellchecker design was Re: Solr 3.1 back compat
Date Tue, 26 Oct 2010 12:39:52 GMT
On Tue, Oct 26, 2010 at 8:19 AM, Andrzej Bialecki <> wrote:
> Sometimes you want a dictionary that is cleaned up and re-weighted by an
> external process (human-based or other), even if it originally came from
> your index. So it's not either/or - you can have a file-based dictionary
> that nonetheless gives you stuff that _is_ in your index.

right, and i would like to possibly support this in my spellchecker
via DFA intersection at runtime (intersect the special cleaned-up DFA
with the levenshtein query DFA).
but the underlying "dictionary" (the lucene index) is unchanged,
instead this would act like a filter.

it would be nice if the concept was somehow more general and for the
other spellcheckers *implemented* via Dictionary, but that shouldn't
be the only way.

> (Yeah, and sorted vs. unsorted ... I tried to hack it by tagging some
> classes with a SortedIterator, but it was indeed a half-hearted
> attempt... it needs to be fixed, not worked around).

It would be cool to add this to Lucene in the short term, so we could
mark the LuceneDictionary as being in sorted order... then we could
explore the TermEnum optimization i spoke of, rather than calling
IndexReader.docFreq() on the spellcheck index for every term in the
dictionary to see if it already exists.

Yeah, i know if they are sorted they will tend to be in the same TII
block, and the term dictionary cache will generally work, but I think
it would still end out faster... and no need to completely hose the
term dictionary cache to rebuild a spellcheck index.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message