lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Muir <rcm...@gmail.com>
Subject Re: Solr 3.1 back compat
Date Tue, 26 Oct 2010 02:14:34 GMT
On Mon, Oct 25, 2010 at 9:42 PM, Grant Ingersoll <gsingers@apache.org> wrote:
> As part of https://issues.apache.org/jira/browse/SOLR-2080, I'd like to rework the SpellCheckComponent
just a bit to be more generic.  I think I can maintain the URL APIs (i.e. &spellcheck.*)
in a back compatible way, but I would like change some of the Java classes a bit, namely SolrSpellChecker
and related to be reusable and reflect the commonality of the solutions.  The way I see it,
spell checking, auto suggest and related search suggestions are all just suggestions.  We
have much of the framework of this in place, other than a few things at the Java level are
named after spell checking.  I know we generally don't worry too much about Java interfaces
in Solr, but this seems like one area where people do sometimes write their own.  The changes
will be mostly renaming commonalities from "spellcheck" to "suggester" (or something similar)
and so I don't see it as particularly hard to make the change, but it would require some code
changes.  What do people think?  My other option would be to factor out as much commonality
as possible into helper classes, but that doesn't feel as clean.
>
>

Almost certainly not what you are looking for, but I'm gonna complain
anyway from my experience of trying to write a Solr spellchecker
recently.
Note: I didnt take the time to actually try to learn these APIs a lot,
so maybe i'm completely off-base, but this is what it looked like to
me:

I felt the entire framework in Solr is built around the idea of  "take
stuff from one field in an index, shove it into another field of an
index", but my spellchecker doesn't need any of this.

Configuring it for different fields is a pain in the ass, if you have
many, but really the field could and should be a query-time parameter.

The spellchecking apis have a wierd response format "Map<Token,
LinkedHashMap<String, Integer>>" which really just means you can only
provide text and docfreq, but i wanted to return the score, too... so
for now it just gets discarded.

we are still using Token everywhere, again, which is bad news if we
want to do more complex things later... like it would really make
sense to switch to the attributes API if this stuff needs to be
flexible.

Even the input format that comes into the spellchecker in
getSuggestions(SpellingOptions options) is just Tokens, but this is
pretty limiting. For instance, I think it makes way more sense for a
spellchecker API to take Query and return corrected Querys, and in my
situation i could give better results, but the Solr APIs stop me.

Apparently the whole Collator thing is designed to "do this for me",
but i have my own ideas (since my impl is new and different), only i'm
not able to implement them... I don't know how the hell it could be
doing this since i can't return the score.

I realize i could have completely discarded all the spellchecking
APIs, written a ton of code/re-invented wheels, and probably gotten
what i wanted, but i just wimped out and committed a shitty
spellchecker instead.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message