lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Muir <rcm...@gmail.com>
Subject Re: Spellchecker design was Re: Solr 3.1 back compat
Date Tue, 26 Oct 2010 11:24:12 GMT
On Tue, Oct 26, 2010 at 6:59 AM, Grant Ingersoll <gsingers@apache.org> wrote:
>> I felt the entire framework in Solr is built around the idea of  "take
>> stuff from one field in an index, shove it into another field of an
>> index", but my spellchecker doesn't need any of this.
>>
>
> Not really, but...

I think really? I can only "see" part of the query (i think one field
at once) via Tokens...

>
> I guess no one has upgraded it yet.  This is 1.3 stuff.  I don't have any problem with
upgrading it.

I'm not saying we have to use the Attributes API, it was just an idea.
but we really have to move the stuff from this component from
"solr-makes-the-decisions" into "user-makes-the-decisions". This is
the number 1 problem with the current spellchecker (ok, maybe #2, #1
being the index-based one doesnt close its indexreader).

>
>>
>> Even the input format that comes into the spellchecker in
>> getSuggestions(SpellingOptions options) is just Tokens, but this is
>> pretty limiting. For instance, I think it makes way more sense for a
>> spellchecker API to take Query and return corrected Querys, and in my
>> situation i could give better results, but the Solr APIs stop me.
>
> And you are then going to do Query.toString() to display that back to the user?

why do you care? maybe that works fine for me, i don't use the dismax
parser that generates horrific queries so everything is fine... and
thats my point... something more like a pipeline/attributes-based
thing woudl work much better here, its up to the user.

certainly it makes sense to keep the original query around... why hide
it? and the hairy mess of code that converts it into tokens, this
needs to be something like a pipeline, because some people don't want
it, or want to do it their own way.

And, lets say i have a hunspell dictionary for my language... how do i
plug this in? I don't want it to implement Dictionary, because I'm not
stupid enough to return something thats not in my index (see below),
maybe i only want to use it as a 'filter' to prevent suggestions that
are spelled incorrectly...


we really need to seriously clean house on the spellchecker stuff
(lucene too) and to answer your question, if we can fix these APIs in
any way, I'm all for just doing a backwards break, because I think the
existing APIs are completely broken.

For example, the whole index-based spellchecker in lucene has bad
performance because its APIs were made overly generic:
I think its important that it doesn't call docFreq() on every single
term in the Dictionary when rebuilding, it should walk a TermEnum in
parallel.
But, it can't do this because it can't assume the Dictionary is in
sorted order!?
I guess thats because the "Dictionary" idea was made overly generic,
abstracted into useless PlainTextDictionary and LuceneDictionary.

PlainTextDictionary? useless... why the hell would you return
something that isn't in your index?!

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message