Return-Path: X-Original-To: apmail-lucene-java-user-archive@www.apache.org Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 6B1BE2E8D for ; Tue, 3 May 2011 09:22:20 +0000 (UTC) Received: (qmail 35906 invoked by uid 500); 3 May 2011 09:22:18 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 35869 invoked by uid 500); 3 May 2011 09:22:18 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 35854 invoked by uid 99); 3 May 2011 09:22:18 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 03 May 2011 09:22:18 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,RFC_ABUSE_POST,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of ian.lea@gmail.com designates 209.85.214.176 as permitted sender) Received: from [209.85.214.176] (HELO mail-iw0-f176.google.com) (209.85.214.176) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 03 May 2011 09:22:13 +0000 Received: by iwr19 with SMTP id 19so8412356iwr.35 for ; Tue, 03 May 2011 02:21:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:from:date :message-id:subject:to:content-type:content-transfer-encoding; bh=rZcqe8eJVR9OzDLAeKOWhotYtk2WBIQ5R+KnwXrBdBA=; b=bb2pOC1bqia4waOipNMeMpEbNIfd5zrOAjBZnsDD5KyYx9EeRU58plsaomQEG3rcz8 9KfILk1mBBc/npkb2r8xAcGSDGuHkBoGv1u9MImYiGvn/Wgp+40QfW9gqzxlAaTH0XMW wsVhSf2dv02xic74xcuE/rMm0rA13hTd6bFW0= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:content-transfer-encoding; b=KFynYonGlQgz93ApwXo5xIZ3fRh7GTWe3QA/HDmmVbLZYo7+bxbW4qWMKpKSkUXpem B+6Xu8r9WnWarbtYLvIg7BjNX2+tAs2f6k6353mcq/m+pFzak0YupgdQKo28S4MT7KcA BOts5+5HTGa2vCTdOlo4trAIUKHZskYgrANbo= Received: by 10.231.253.4 with SMTP id my4mr736506ibb.153.1304414513125; Tue, 03 May 2011 02:21:53 -0700 (PDT) MIME-Version: 1.0 Received: by 10.231.206.147 with HTTP; Tue, 3 May 2011 02:21:33 -0700 (PDT) In-Reply-To: References: <000001cc08bf$19583a50$4c08aef0$@thetaphi.de> From: Ian Lea Date: Tue, 3 May 2011 10:21:33 +0100 Message-ID: Subject: Re: "fuzzy prefix" search To: java-user@lucene.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable I'd assumed that FuzzyQuery wouldn't ignore case but I could be wrong. What would be the edit distance between "mer" and "merlot"? Would it be less that 1.5 which I reckon would be the value of length(term)*0.5 as detailed in the javadocs? Seems unlikely, but I don't really know anything about the Levenshtein (edit distance) algorithm as used by FuzzyQuery. Wouldn't a PrefixQuery be more appropriate here? -- Ian. On Tue, May 3, 2011 at 10:10 AM, Clemens Wyss wrote: > Unfortunately lowercasing doesn't help. > Also, doesn't the FuzzyQuery ignore casing? > >> -----Urspr=FCngliche Nachricht----- >> Von: Ian Lea [mailto:ian.lea@gmail.com] >> Gesendet: Dienstag, 3. Mai 2011 11:06 >> An: java-user@lucene.apache.org >> Betreff: Re: "fuzzy prefix" search >> >> Mer !=3D mer. =A0The latter will be what is indexed because StandardAnal= yzer >> calls LowerCaseFilter. >> >> -- >> Ian. >> >> >> On Tue, May 3, 2011 at 9:56 AM, Clemens Wyss >> wrote: >> > Sorry for coming back to my issue. Can anybody explain why my "simple" >> unit test below fails? Any hint/help appreciated. >> > >> > Directory directory =3D new RAMDirectory(); IndexWriter indexWriter = =3D >> > new IndexWriter( directory, new StandardAnalyzer( Version.LUCENE_31 ), >> > IndexWriter.MaxFieldLength.UNLIMITED ); Document document =3D new >> > Document(); document.add( new Field( "test", "Merlot", >> > Field.Store.YES, Field.Index.ANALYZED ) ); indexWriter.addDocument( >> > document ); IndexReader indexReader =3D indexWriter.getReader(); >> > IndexSearcher searcher =3D new IndexSearcher( indexReader ); Query q = =3D >> > new FuzzyQuery( new Term( "test", "Mer" ), 0.5f, 0, 10 ); // or Query >> > q =3D new FuzzyQuery( new Term( "test", "Mer" ), 0.5f); TopDocs result= =3D >> > searcher.search( q, 10 ); Assert.assertEquals( 1, result.totalHits ); >> > >> > - Clemens >> > >> >> -----Urspr=FCngliche Nachricht----- >> >> Von: Clemens Wyss [mailto:clemensdev@mysign.ch] >> >> Gesendet: Montag, 2. Mai 2011 23:01 >> >> An: java-user@lucene.apache.org >> >> Betreff: AW: "fuzzy prefix" search >> >> >> >> Is it the combination of FuzzyQuery and Term which makes the search >> >> to go for "word boundaries"? >> >> >> >> > -----Urspr=FCngliche Nachricht----- >> >> > Von: Clemens Wyss [mailto:clemensdev@mysign.ch] >> >> > Gesendet: Montag, 2. Mai 2011 14:13 >> >> > An: java-user@lucene.apache.org >> >> > Betreff: AW: "fuzzy prefix" search >> >> > >> >> > I tried this too, but unfortunately I only get hits when the search >> >> > term is a least as long as the word to be looked up. >> >> > >> >> > E.g.: >> >> > ... >> >> > Directory directory =3D new RAMDirectory(); IndexWriter indexWriter= =3D >> >> > new IndexWriter( directory, IndexManager.getIndexingAnalyzer( >> >> LOCALE_DE ), >> >> > =A0 =A0 =A0 =A0 =A0 =A0 IndexWriter.MaxFieldLength.UNLIMITED ); >> >> > >> >> > Document document =3D new Document(); document.add( new Field( >> >> > "test", "Merlot", >> >> > =A0 =A0 =A0 =A0 =A0 =A0 Field.Store.YES, Field.Index.ANALYZED ) ); >> >> indexWriter.addDocument( >> >> > document ); >> >> > >> >> > IndexReader indexReader =3D indexWriter.getReader(); IndexSearcher >> >> > searcher =3D new IndexSearcher( indexReader ); >> >> > >> >> > Query q =3D new FuzzyQuery( new Term( "test", "Mer" ), 0.6f, 1 ); >> >> > TopDocs result =3D searcher.search( q, 10 ); Assert.assertEquals( 1= , >> >> result.totalHits ); ... >> >> > >> >> > > -----Urspr=FCngliche Nachricht----- >> >> > > Von: Uwe Schindler [mailto:uwe@thetaphi.de] >> >> > > Gesendet: Montag, 2. Mai 2011 13:50 >> >> > > An: java-user@lucene.apache.org >> >> > > Betreff: RE: "fuzzy prefix" search >> >> > > >> >> > > Hi, >> >> > > >> >> > > You can pass an integer to FuzzyQuery which defines the number of >> >> > > characters that are seen as prefix. So all terms must match this >> >> > > prefix and the rest of each term is matched using fuzzy. >> >> > > >> >> > > Uwe >> >> > > >> >> > > ----- >> >> > > Uwe Schindler >> >> > > H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de >> >> > > eMail: uwe@thetaphi.de >> >> > > >> >> > > > -----Original Message----- >> >> > > > From: Clemens Wyss [mailto:clemensdev@mysign.ch] >> >> > > > Sent: Monday, May 02, 2011 1:47 PM >> >> > > > To: java-user@lucene.apache.org >> >> > > > Subject: "fuzzy prefix" search >> >> > > > >> >> > > > I'd like to search fuzzily but not on a full term. >> >> > > > E.g. >> >> > > > I have a text "Merlot del Ticino" >> >> > > > I'd like >> >> > > > "mer", "merr", "melo", ... to match. >> >> > > > >> >> > > > If I use FuzzyQuery only "merlot, =A0"merlott" hit. What >> >> > > > Query-combination should I use? >> >> > > > >> >> > > > Thx >> >> > > > Clemens >> >> > > > >> >> > > > >> >> > > > --------------------------------------------------------------- >> >> > > > --- >> >> > > > -- >> >> > > > - To unsubscribe, e-mail: >> >> > > > java-user-unsubscribe@lucene.apache.org >> >> > > > For additional commands, e-mail: >> >> > > > java-user-help@lucene.apache.org >> >> > > >> >> > > >> >> > > >> >> > > ----------------------------------------------------------------- >> >> > > --- >> >> > > - To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >> >> > > For additional commands, e-mail: java-user-help@lucene.apache.org >> >> > >> >> > >> >> > ------------------------------------------------------------------- >> >> > -- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >> >> > For additional commands, e-mail: java-user-help@lucene.apache.org >> >> >> >> >> >> --------------------------------------------------------------------- >> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >> >> For additional commands, e-mail: java-user-help@lucene.apache.org >> > >> > >> > --------------------------------------------------------------------- >> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >> > For additional commands, e-mail: java-user-help@lucene.apache.org >> > >> > >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >> For additional commands, e-mail: java-user-help@lucene.apache.org > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org