lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@apache.org>
Subject Spellchecker design was Re: Solr 3.1 back compat
Date Tue, 26 Oct 2010 10:59:10 GMT

On Oct 25, 2010, at 10:14 PM, Robert Muir wrote:

> On Mon, Oct 25, 2010 at 9:42 PM, Grant Ingersoll <gsingers@apache.org> wrote:
>> As part of https://issues.apache.org/jira/browse/SOLR-2080, I'd like to rework the
SpellCheckComponent just a bit to be more generic.  I think I can maintain the URL APIs (i.e.
&spellcheck.*) in a back compatible way, but I would like change some of the Java classes
a bit, namely SolrSpellChecker and related to be reusable and reflect the commonality of the
solutions.  The way I see it, spell checking, auto suggest and related search suggestions
are all just suggestions.  We have much of the framework of this in place, other than a few
things at the Java level are named after spell checking.  I know we generally don't worry
too much about Java interfaces in Solr, but this seems like one area where people do sometimes
write their own.  The changes will be mostly renaming commonalities from "spellcheck" to "suggester"
(or something similar) and so I don't see it as particularly hard to make the change, but
it would require some code changes.  What do people think?  My other option would be to factor
out as much commonality as possible into helper classes, but that doesn't feel as clean.
>> 
>> 
> 
> Almost certainly not what you are looking for,

Yeah, pretty much doesn't answer a single question I asked, but nonetheless, I'm happy to
discuss a better design.  We really should discuss on another thread.

> but I'm gonna complain
> anyway from my experience of trying to write a Solr spellchecker
> recently.
> Note: I didnt take the time to actually try to learn these APIs a lot,
> so maybe i'm completely off-base, but this is what it looked like to
> me:
> 
> I felt the entire framework in Solr is built around the idea of  "take
> stuff from one field in an index, shove it into another field of an
> index", but my spellchecker doesn't need any of this.
> 

Not really, but...



> Configuring it for different fields is a pain in the ass, if you have
> many, but really the field could and should be a query-time parameter.

In fact, the SpellingOptions allows this.  You should look at the customParams piece.  You
can pass in arbitrary query time parameters.

> 
> The spellchecking apis have a wierd response format "Map<Token,
> LinkedHashMap<String, Integer>>" which really just means you can only
> provide text and docfreq, but i wanted to return the score, too... so
> for now it just gets discarded.

That kind of stuff can and should be changed.  Those are internal APIs.  If you want score
in there, then we should change it to something like <Token, Map<String, SuggestionInfo>>
where SuggestionInfo (or whatever you want to call it) is contains freq, score, etc.

> 
> we are still using Token everywhere, again, which is bad news if we
> want to do more complex things later... like it would really make
> sense to switch to the attributes API if this stuff needs to be
> flexible.

I guess no one has upgraded it yet.  This is 1.3 stuff.  I don't have any problem with upgrading
it.

> 
> Even the input format that comes into the spellchecker in
> getSuggestions(SpellingOptions options) is just Tokens, but this is
> pretty limiting. For instance, I think it makes way more sense for a
> spellchecker API to take Query and return corrected Querys, and in my
> situation i could give better results, but the Solr APIs stop me.

And you are then going to do Query.toString() to display that back to the user?  

> 
> Apparently the whole Collator thing is designed to "do this for me",
> but i have my own ideas (since my impl is new and different), only i'm
> not able to implement them... I don't know how the hell it could be
> doing this since i can't return the score.
> 
> I realize i could have completely discarded all the spellchecking
> APIs, written a ton of code/re-invented wheels, and probably gotten
> what i wanted, but i just wimped out and committed a shitty
> spellchecker instead.

Or you could ask questions and we could discuss how to improve it.  We probably could get
you what you want w/o that much of a change.
	
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message