Return-Path: Delivered-To: apmail-jakarta-lucene-user-archive@www.apache.org Received: (qmail 65913 invoked from network); 15 Sep 2004 19:55:17 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur-2.apache.org with SMTP; 15 Sep 2004 19:55:17 -0000 Received: (qmail 55344 invoked by uid 500); 15 Sep 2004 19:55:00 -0000 Delivered-To: apmail-jakarta-lucene-user-archive@jakarta.apache.org Received: (qmail 55211 invoked by uid 500); 15 Sep 2004 19:54:59 -0000 Mailing-List: contact lucene-user-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Users List" Reply-To: "Lucene Users List" Delivered-To: mailing list lucene-user@jakarta.apache.org Received: (qmail 55083 invoked by uid 99); 15 Sep 2004 19:54:57 -0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received-SPF: pass (hermes.apache.org: local policy) Received: from [66.139.76.19] (HELO server1.hostmon.com) (66.139.76.19) by apache.org (qpsmtpd/0.28) with ESMTP; Wed, 15 Sep 2004 12:54:57 -0700 Received: (qmail 3872 invoked by uid 532); 15 Sep 2004 19:52:21 -0000 Received: from dave-lucene-user@tropo.com by server1.hostmon.com by uid 0 with qmail-scanner-1.16 (spamassassin: 2.63. Clear:. Processed in 0.133858 secs); 15 Sep 2004 19:52:21 -0000 Received: from unknown (HELO ?10.0.0.157?) (127.0.0.1) by 0 with SMTP; 15 Sep 2004 19:52:21 -0000 Message-ID: <41489E0E.5090402@tropo.com> Date: Wed, 15 Sep 2004 12:54:54 -0700 From: David Spencer User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.1) Gecko/20040707 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Lucene Users List Subject: Re: NGramSpeller contribution -- Re: combining open office spellchecker with Lucene References: <000201c49b10$9bcf4290$0a00a8c0@aadlaptop> In-Reply-To: <000201c49b10$9bcf4290$0a00a8c0@aadlaptop> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N Aad Nales wrote: > By trying: if you type const you will find that it returns 216 hits. The > third sports 'const' as a term (space seperated and all). I would expect > 'conts' to return with const as well. But again I might be mistaken. I > am now trying to figure what the problem might be: > > 1. my expectations (most likely ;-) > 2. something in the code.. I enhanced the code to store simple transpositions also and I regenerated my site w/ ngrams from 2 to 5 chars. If you set the transposition boost up to 10 then "const" is returned 2nd... http://www.searchmorph.com/kat/spell.jsp?s=conts&min=2&max=5&maxd=5&maxr=10&bstart=2.0&bend=1.0&btranspose=10.0&popular=1 > > -----Original Message----- > From: Andrzej Bialecki [mailto:ab@getopt.org] > Sent: Wednesday, 15 September, 2004 12:23 > To: Lucene Users List > Subject: Re: NGramSpeller contribution -- Re: combining open office > spellchecker with Lucene > > > Aad Nales wrote: > > >>David, >> >>Perhaps I misunderstand somehting so please correct me if I do. I used > > >>http://www.searchmorph.com/kat/spell.jsp to look for conts without >>changing any of the default values. What I got as results did not >>include 'const' which has quite a high frequency in your index and > > > ??? how do you know that? Remember, this is an index of _Java_docs, and > "const" is not a Java keyword. > > >>should have a pretty low levenshtein distance. Any idea what causes >>this behavior? > > > > --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-user-help@jakarta.apache.org