lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Blackerby <jblacke...@gmail.com>
Subject Re: Spell checking question from a Solr novice
Date Mon, 18 Oct 2010 21:24:08 GMT
If you know the misspellings you could prevent them from being added to the
dictionary with a StopFilterFactory like so:

    <fieldType name="textSpell" class="solr.TextField"
positionIncrementGap="100" >
      <analyzer>
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true"
words="misspelled_words.txt"/>
        <filter class="solr.PatternReplaceFilterFactory" pattern="([^a-z])"
replacement="" replace="all"/>
        <filter class="solr.LengthFilterFactory" min="2" max="50"/>
      </analyzer>
    </fieldType>

where misspelled_words.txt contains the misspellings.

On Mon, Oct 18, 2010 at 5:14 PM, Pradeep Singh <pksinghus@gmail.com> wrote:

> I think a spellchecker based on your index has clear advantages. You can
> spellcheck words specific to your domain which may not be available in an
> outside dictionary. You can always dump the list from wordnet to get a
> starter english dictionary.
>
> But then it also means that misspelled words from your domain become the
> suggested correct word. Hmmm ... you'll need to have a way to prune out
> such
> words. Even then, your own domain based dictionary is a total go.
>
> On Mon, Oct 18, 2010 at 1:55 PM, Jonathan Rochkind <rochkind@jhu.edu>
> wrote:
>
> > In general, the benefit of the built-in Solr spellcheck is that it can
> use
> > a dictionary based on your actual index.
> >
> > If you want to use some external API, you certainly can, in your actual
> > client app -- but it doesn't really need to involve Solr at all anymore,
> > does it?  Is there any benefit I'm not thinking of to doing that on the
> solr
> > side, instead of just in your client app?
> >
> > I think Yahoo (and maybe Microsoft?) have similar APIs with more generous
> > ToSs, but I haven't looked in a while.
> >
> >
> > Xin Li wrote:
> >
> >> Oops, never mind. Just read Google API policy. 1000 queries per day
> limit
> >> & for non-commercial use only.
> >>
> >>
> >> -----Original Message-----
> >> From: Xin Li Sent: Monday, October 18, 2010 3:43 PM
> >> To: solr-user@lucene.apache.org
> >> Subject: Spell checking question from a Solr novice
> >>
> >> Hi,
> >> I am looking for a quick solution to improve a search engine's spell
> >> checking performance. I was wondering if anyone tried to integrate
> Google
> >> SpellCheck API with Solr search engine (if possible). Google spellcheck
> came
> >> to my mind because of two reasons. First, it is costly to clean up the
> data
> >> to be used as spell check baseline. Secondly, google probably has the
> most
> >> complete set of misspelled search terms. That's why I would like to know
> if
> >> it is a feasible way to go.
> >>
> >> Thanks,
> >> Xin
> >> This electronic mail message contains information that (a) is or may be
> >> CONFIDENTIAL, PROPRIETARY IN NATURE, OR OTHERWISE PROTECTED BY LAW FROM
> >> DISCLOSURE, and (b) is intended only for the use of the
> >> addressee(s) named herein.  If you are not an intended recipient, please
> >> contact the sender immediately and take the steps necessary to delete
> the
> >> message completely from your computer system.
> >>
> >> Not Intended as a Substitute for a Writing: Notwithstanding the Uniform
> >> Electronic Transaction Act or any other law of similar effect, absent an
> >> express statement to the contrary, this e-mail message, its contents,
> and
> >> any attachments hereto are not intended to represent an offer or
> acceptance
> >> to enter into a contract and are not otherwise intended to bind this
> sender,
> >> barnesandnoble.com llc, barnesandnoble.com inc. or any other person or
> >> entity.
> >> This electronic mail message contains information that (a) is or may be
> >> CONFIDENTIAL, PROPRIETARY IN NATURE, OR OTHERWISE PROTECTED BY LAW FROM
> >> DISCLOSURE, and (b) is intended only for the use of the
> >> addressee(s) named herein.  If you are not an intended recipient, please
> >> contact the sender immediately and take the steps necessary to delete
> the
> >> message completely from your computer system.
> >>
> >> Not Intended as a Substitute for a Writing: Notwithstanding the Uniform
> >> Electronic Transaction Act or any other law of similar effect, absent an
> >> express statement to the contrary, this e-mail message, its contents,
> and
> >> any attachments hereto are not intended to represent an offer or
> acceptance
> >> to enter into a contract and are not otherwise intended to bind this
> sender,
> >> barnesandnoble.com llc, barnesandnoble.com inc. or any other person or
> >> entity.
> >>
> >>
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message