Return-Path: X-Original-To: apmail-lucene-java-user-archive@www.apache.org Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 6CD50E18E for ; Thu, 21 Feb 2013 21:32:53 +0000 (UTC) Received: (qmail 26740 invoked by uid 500); 21 Feb 2013 21:32:50 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 26678 invoked by uid 500); 21 Feb 2013 21:32:50 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 26617 invoked by uid 99); 21 Feb 2013 21:32:50 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 21 Feb 2013 21:32:50 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of samuelgmartinez@gmail.com designates 209.85.219.43 as permitted sender) Received: from [209.85.219.43] (HELO mail-oa0-f43.google.com) (209.85.219.43) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 21 Feb 2013 21:32:46 +0000 Received: by mail-oa0-f43.google.com with SMTP id l10so9685203oag.2 for ; Thu, 21 Feb 2013 13:32:25 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:content-type; bh=oMwMXy9RV14Hn+u6lmVCWCWXe3kNLkPCaAj3/dVTuQY=; b=GLQgEJVVUjKfgQ4CP/8QHxbJbn5K3PNhaLOl1rDeqa2AmUvqeOp+8f8OctNRp1jHkZ sJ+5K2idfLV0lBVCvRvGqf1cIANyderDnIx7gKgqEXvrYgIxsQoIjXf9y1cfIphS0hgL HB/IEFd5mtFma5H73iOk9zWFLRHv7C/bGZOrFBnSgoGzPZLrobBxpjpgPSSnSaahQN/K lJgFxT88eQhSp8XwG21H4gK9jDZUL4zqjvI3OxeQbDPG1bPhjuiO4hjkOty16PeFDWuE XGYa7MbQ70UhhzjK31XjAoSuTzsn5B/qWpGsrpMkPIka/WYoH2L5pAqQZu5JeZ8guMz0 b+gQ== MIME-Version: 1.0 X-Received: by 10.60.13.73 with SMTP id f9mr11134903oec.131.1361482345342; Thu, 21 Feb 2013 13:32:25 -0800 (PST) Received: by 10.76.132.38 with HTTP; Thu, 21 Feb 2013 13:32:25 -0800 (PST) In-Reply-To: References: <8F0D0142CA7ECC4287A9EC1BD8CB880C19D6AE23FC@USLVDCMBVP01.ingramcontent.com> Date: Thu, 21 Feb 2013 22:32:25 +0100 Message-ID: Subject: Re: possible bug on Spellchecker From: =?ISO-8859-1?Q?Samuel_Garc=EDa_Mart=EDnez?= To: java-user@lucene.apache.org Content-Type: multipart/alternative; boundary=e89a8fb202768ff3cb04d642cffc X-Virus-Checked: Checked by ClamAV on apache.org --e89a8fb202768ff3cb04d642cffc Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Here it is https://issues.apache.org/jira/browse/LUCENE-4793 :) On Thu, Feb 21, 2013 at 9:02 PM, Samuel Garc=EDa Mart=EDnez < samuelgmartinez@gmail.com> wrote: > Yes, of course i can. I'll try to open it this night (European Time) or > tomorrow as soon as I get to the office. > > > On Thu, Feb 21, 2013 at 4:14 PM, Dyer, James > wrote: > >> Samuel, >> >> Do you think you could write a failing unit test and open a JIRA issue? >> Or at the least open a JIRA issue with all the details without a test? >> >> James Dyer >> Ingram Content Group >> (615) 213-4311 >> >> >> -----Original Message----- >> From: Samuel Garc=EDa Mart=EDnez [mailto:samuelgmartinez@gmail.com] >> Sent: Thursday, February 21, 2013 2:33 AM >> To: java-user@lucene.apache.org >> Subject: Re: possible bug on Spellchecker >> Importance: Low >> >> I'm using Solr 3.6 and DirectSpellchecker is available only on v4+. >> Moreover, in "big" indexes i prefer using sidekick index rather than >> iterating over term dictionary. >> >> >> On Thu, Feb 21, 2013 at 8:19 AM, Jack Krupansky > >wrote: >> >> > Any reason that you are not using the DirectSpellChecker? >> > >> > See: >> > http://lucene.apache.org/core/**4_0_0/suggest/org/apache/** >> > lucene/search/spell/**DirectSpellChecker.html< >> http://lucene.apache.org/core/4_0_0/suggest/org/apache/lucene/search/spe= ll/DirectSpellChecker.html >> > >> > >> > -- Jack Krupansky >> > >> > -----Original Message----- From: Samuel Garc=EDa Mart=EDnez >> > Sent: Wednesday, February 20, 2013 3:34 PM >> > To: java-user@lucene.apache.org >> > Subject: possible bug on Spellchecker >> > >> > >> > Hi all, >> > >> > Debugging Solr spellchecker (IndexBasedSpellchecker, delegating on >> lucene >> > Spellchecker) behaviour i think i found a bug when the input is a 6 >> letter >> > word: >> > - george >> > - anthem >> > - argued >> > - fluent >> > >> > Due to the getMin() and getMax() the grams indexed for these terms are= 3 >> > and 4. So, the fields would be something like this: >> > - for "*george*" >> > >> > - start3: "geo" >> > - start4: "geor" >> > - end3: "rge" >> > - end4: "orge" >> > - 3: "geo", "eor", "org", "rge" >> > - 4: "geor", "eorg", "orge" >> > - for "*anthem*" >> > >> > - start3: "ant" >> > - start4: "anth" >> > - end3: "tem" >> > - end4: "them" >> > >> > The problem shows up when the user swap 3rd a 4th characters, >> misspelling >> > the word like this: >> > - geroge >> > - anhtem >> > >> > The queries generated for this terms are: (SHOULD boolean queries) >> > - for "*geroge*" >> > >> > - start3: "ger" >> > - start4: "gero" >> > - end3: "oge" >> > - end4: "roge" >> > - 3: "ger", "ero", "rog", "oge" >> > - 4: "gero", "erog", "roge" >> > - for "*anhtem*" >> > >> > - start3: "anh" >> > - start4: "anht" >> > - end3: "tem" >> > - end4: "htem" >> > - 3: "anh", "nht", "hte", "tem" >> > - 4: "anht", "nhte", "htem" >> > >> > So, as you can see, this kind of misspelling never matches the suitabl= e >> > suggestions although the edit distance is 0.95555556. >> > >> > I think getMin(int l) and getMax(int l) should return 2 and 3, >> > respectively, for l=3D=3D6. Debugging other values i did not found any >> problem >> > with any kind of misspelling. >> > >> > Any thoughts about this? >> > >> > -- >> > Un saludo, >> > Samuel Garc=EDa >> > >> > >> ------------------------------**------------------------------**--------= - >> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.**apache.org< >> java-user-unsubscribe@lucene.apache.org> >> > For additional commands, e-mail: java-user-help@lucene.apache.**org< >> java-user-help@lucene.apache.org> >> > >> > >> >> >> -- >> Un saludo, >> Samuel Garc=EDa. >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >> For additional commands, e-mail: java-user-help@lucene.apache.org >> >> > > > -- > Un saludo, > Samuel Garc=EDa. > --=20 Un saludo, Samuel Garc=EDa. --e89a8fb202768ff3cb04d642cffc--