Return-Path: Delivered-To: apmail-lucene-solr-user-archive@minotaur.apache.org Received: (qmail 24520 invoked from network); 18 Oct 2010 21:24:37 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 18 Oct 2010 21:24:37 -0000 Received: (qmail 21477 invoked by uid 500); 18 Oct 2010 21:24:35 -0000 Delivered-To: apmail-lucene-solr-user-archive@lucene.apache.org Received: (qmail 21367 invoked by uid 500); 18 Oct 2010 21:24:35 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 21359 invoked by uid 99); 18 Oct 2010 21:24:35 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 18 Oct 2010 21:24:35 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of jblackerby@gmail.com designates 209.85.215.48 as permitted sender) Received: from [209.85.215.48] (HELO mail-ew0-f48.google.com) (209.85.215.48) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 18 Oct 2010 21:24:28 +0000 Received: by ewy28 with SMTP id 28so1127155ewy.35 for ; Mon, 18 Oct 2010 14:24:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:content-type; bh=LcqeGzTl0OXSei+7FcuebMpi4xddnMiKzzETQVgyqZQ=; b=n4s6w4VcsxX9L9yLjn4mcWLgi9FXU7npVBhmArsolDiHSVwy4y3dlQYzqTRi+tmWlR qNdmy1rR2GoiRzcQ2LGnws1sJWGxImST9ogd4n0yjQcsMfxlT1QqAlIEJxgXBG7HF99K y/5QUOfTrbuMFKF52bJfmBsD16bLlEPi5V9lY= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=G6MARZ6jpmKIgPb3vjMm1durEmdxQ7Ut8sf9gBcJOW+Ojihmj3veYIOl85QFg1RBcK 3Ed8g3BleeUkeRsx3Zj3bfkuF8iro4NHf3y8GcQYsBvb9rc86q6ap4ZNKBkdOA0d1R+j eugx4wzF7U3vLqY8TuYVmWyRE+4LbMlFakCes= MIME-Version: 1.0 Received: by 10.213.19.145 with SMTP id a17mr1216537ebb.95.1287437048385; Mon, 18 Oct 2010 14:24:08 -0700 (PDT) Received: by 10.213.105.80 with HTTP; Mon, 18 Oct 2010 14:24:08 -0700 (PDT) In-Reply-To: References: <4CBC2718.7010803@yaco.es> <669614.70823.qm@web82104.mail.mud.yahoo.com> <4CBCB44B.5050905@jhu.edu> Date: Mon, 18 Oct 2010 17:24:08 -0400 Message-ID: Subject: Re: Spell checking question from a Solr novice From: Jason Blackerby To: solr-user@lucene.apache.org Content-Type: multipart/alternative; boundary=0015174c1300f0930d0492eacbb1 X-Virus-Checked: Checked by ClamAV on apache.org --0015174c1300f0930d0492eacbb1 Content-Type: text/plain; charset=ISO-8859-1 If you know the misspellings you could prevent them from being added to the dictionary with a StopFilterFactory like so: where misspelled_words.txt contains the misspellings. On Mon, Oct 18, 2010 at 5:14 PM, Pradeep Singh wrote: > I think a spellchecker based on your index has clear advantages. You can > spellcheck words specific to your domain which may not be available in an > outside dictionary. You can always dump the list from wordnet to get a > starter english dictionary. > > But then it also means that misspelled words from your domain become the > suggested correct word. Hmmm ... you'll need to have a way to prune out > such > words. Even then, your own domain based dictionary is a total go. > > On Mon, Oct 18, 2010 at 1:55 PM, Jonathan Rochkind > wrote: > > > In general, the benefit of the built-in Solr spellcheck is that it can > use > > a dictionary based on your actual index. > > > > If you want to use some external API, you certainly can, in your actual > > client app -- but it doesn't really need to involve Solr at all anymore, > > does it? Is there any benefit I'm not thinking of to doing that on the > solr > > side, instead of just in your client app? > > > > I think Yahoo (and maybe Microsoft?) have similar APIs with more generous > > ToSs, but I haven't looked in a while. > > > > > > Xin Li wrote: > > > >> Oops, never mind. Just read Google API policy. 1000 queries per day > limit > >> & for non-commercial use only. > >> > >> > >> -----Original Message----- > >> From: Xin Li Sent: Monday, October 18, 2010 3:43 PM > >> To: solr-user@lucene.apache.org > >> Subject: Spell checking question from a Solr novice > >> > >> Hi, > >> I am looking for a quick solution to improve a search engine's spell > >> checking performance. I was wondering if anyone tried to integrate > Google > >> SpellCheck API with Solr search engine (if possible). Google spellcheck > came > >> to my mind because of two reasons. First, it is costly to clean up the > data > >> to be used as spell check baseline. Secondly, google probably has the > most > >> complete set of misspelled search terms. That's why I would like to know > if > >> it is a feasible way to go. > >> > >> Thanks, > >> Xin > >> This electronic mail message contains information that (a) is or may be > >> CONFIDENTIAL, PROPRIETARY IN NATURE, OR OTHERWISE PROTECTED BY LAW FROM > >> DISCLOSURE, and (b) is intended only for the use of the > >> addressee(s) named herein. If you are not an intended recipient, please > >> contact the sender immediately and take the steps necessary to delete > the > >> message completely from your computer system. > >> > >> Not Intended as a Substitute for a Writing: Notwithstanding the Uniform > >> Electronic Transaction Act or any other law of similar effect, absent an > >> express statement to the contrary, this e-mail message, its contents, > and > >> any attachments hereto are not intended to represent an offer or > acceptance > >> to enter into a contract and are not otherwise intended to bind this > sender, > >> barnesandnoble.com llc, barnesandnoble.com inc. or any other person or > >> entity. > >> This electronic mail message contains information that (a) is or may be > >> CONFIDENTIAL, PROPRIETARY IN NATURE, OR OTHERWISE PROTECTED BY LAW FROM > >> DISCLOSURE, and (b) is intended only for the use of the > >> addressee(s) named herein. If you are not an intended recipient, please > >> contact the sender immediately and take the steps necessary to delete > the > >> message completely from your computer system. > >> > >> Not Intended as a Substitute for a Writing: Notwithstanding the Uniform > >> Electronic Transaction Act or any other law of similar effect, absent an > >> express statement to the contrary, this e-mail message, its contents, > and > >> any attachments hereto are not intended to represent an offer or > acceptance > >> to enter into a contract and are not otherwise intended to bind this > sender, > >> barnesandnoble.com llc, barnesandnoble.com inc. or any other person or > >> entity. > >> > >> > > > --0015174c1300f0930d0492eacbb1--