Return-Path: X-Original-To: apmail-lucene-java-user-archive@www.apache.org Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id CA9F521A3 for ; Tue, 3 May 2011 21:12:16 +0000 (UTC) Received: (qmail 14035 invoked by uid 500); 3 May 2011 21:12:14 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 13995 invoked by uid 500); 3 May 2011 21:12:14 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 13987 invoked by uid 99); 3 May 2011 21:12:14 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 03 May 2011 21:12:14 +0000 X-ASF-Spam-Status: No, hits=0.0 required=5.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,RFC_ABUSE_POST,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [66.94.238.136] (HELO web130109.mail.mud.yahoo.com) (66.94.238.136) by apache.org (qpsmtpd/0.29) with SMTP; Tue, 03 May 2011 21:12:07 +0000 Received: (qmail 50820 invoked by uid 60001); 3 May 2011 21:11:45 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1304457105; bh=cPQyydM9jtSkCgpsYxgW0JaUDns5nptVhn6tkHPoe4Q=; h=Message-ID:X-YMail-OSG:Received:X-Mailer:References:Date:From:Subject:To:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding; b=pns7XVFx5XhI4FwLGNRDiYnXNqS3aePr3ZcRpnOBNOQN7TMKYU7sxe0GY154f3XZ6QtjjbABeOktNpaP7loaxxCv1FEQoDIZeCre7OqfT8ecpsPy2aX/DesH1I+P6CZj1vm9MENVfhiE0zGYESvtSJX9OvEwhPbygnBmtmrKtGo= DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=Message-ID:X-YMail-OSG:Received:X-Mailer:References:Date:From:Subject:To:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding; b=a38S49JDTpKlExTod3ELNKfKzm555/wP9BoJNFQzkQylvGBHz9Dz3DjNhYERywAOUcxOymCPqLACYoaQsZY7HSs0LqVCN/PVivFNp6J6jyzQZ9bJ4c79KBdCbeFGOglGN0YH6wr5P3TMua6NXRdgCbonOwuzCbeCwLOmYeoDnhY=; Message-ID: <855496.50707.qm@web130109.mail.mud.yahoo.com> X-YMail-OSG: 2e0xRyMVM1nRe3Bc6rHDoewqCSo7eAHNAUP2.wNWSJ9YjrJ 8f3sch_61GmhRR565D3Uu6riA9cbeLo3p_rog1HPptn2aQqIWQE828YJ58pj Y8xgT_sLejjUl71FfZ_l_nVOXY_RAL7LL2FlZeJlpRcJZbqkvVsUbKl30VXa tcvhRKx2TH2RIp4axBLOMHkxQF1hgF9VoYG6HN9hiafKbZ5zWKctRk65jbOG KF38yFfJJ_6mAc_DMWw5qN1dfRrujyuLyylQCx8zKk50mlYIH4MisorbfK7x TpQ1AE4Mua1IbdlXIm8FN5GqdIhyBYdN3wzxuHk2gDXSV20cfsQoC659MP1P gr0Jz1AiczNOTk00GE10Amm7WVqE1Z3F5BAwLwnjU0.AGjee1V.13qxe7Cxu 16Sa876o00a0K Received: from [184.75.0.187] by web130109.mail.mud.yahoo.com via HTTP; Tue, 03 May 2011 14:11:45 PDT X-Mailer: YahooMailRC/559 YahooMailWebService/0.8.110.299900 References: <000001cc08bf$19583a50$4c08aef0$@thetaphi.de> <796945.6397.qm@web130103.mail.mud.yahoo.com> <733611.86868.qm@web130122.mail.mud.yahoo.com> Date: Tue, 3 May 2011 14:11:45 -0700 (PDT) From: Otis Gospodnetic Subject: Re: AW: AW: AW: "fuzzy prefix" search To: java-user@lucene.apache.org In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org Clemens - that's just an example. Stick another tokenizer in there, like = =0AWhitespaceTokenizer in there, for example.=0A=0AOtis=0A----=0ASematext := : http://sematext.com/ :: Solr - Lucene - Nutch=0ALucene ecosystem search := : http://search-lucene.com/=0A=0A=0A=0A----- Original Message ----=0A> From= : Clemens Wyss =0A> To: "java-user@lucene.apache.org"= =0A> Sent: Tue, May 3, 2011 4:31:14 PM=0A> Su= bject: AW: AW: AW: "fuzzy prefix" search=0A> =0A> But doesn't the KeyWordTo= kenizer extract single words out oft he stream? I =0A>would like to create= n-grams on the stream (field content) as it is...=0A> =0A> > -----Urspr= =FCngliche Nachricht-----=0A> > Von: Otis Gospodnetic [mailto:otis_gospodne= tic@yahoo.com]=0A> > Gesendet: Dienstag, 3. Mai 2011 21:31=0A> > An: java-= user@lucene.apache.org=0A> > Betreff: Re: AW: AW: "fuzzy prefix" search=0A= > > =0A> > Clemens,=0A> > =0A> > Something a la:=0A> > =0A> > public TokenS= tream tokenStream (String fieldName, Reader r) {=0A> > return nw EdgeNGr= amTokenFilter(new KeywordTokenizer(r),=0A> > EdgeNGramTokenFilter.Side.FRO= NT, 1, 4); }=0A> > =0A> > =0A> > Check out page 265 of Lucene in Action 2.= =0A> > =0A> > Otis=0A> > ----=0A> > Sematext :: http://sematext.com/ :: So= lr - Lucene - Nutch=0A> > Lucene ecosystem search :: http://search-lucene.= com/=0A> > =0A> > =0A> > =0A> > ----- Original Message ----=0A> > > From: = Clemens Wyss =0A> > > To: "java-user@lucene.apache.o= rg" =0A> > > Sent: Tue, May 3, 2011 12:57:39= PM=0A> > > Subject: AW: AW: "fuzzy prefix" search=0A> > >=0A> > > How doe= s an simple Analyzer look that just "n-grams" the docs/fields.=0A> > >=0A= > > > class SimpleNGramAnalyzer extends Analyzer=0A> > > {=0A> > > @Over= ride=0A> > > public TokenStream tokenStream ( String fieldName, Reader re= ader )=0A> > > {=0A> > > EdgeNGramTokenFilter... ???=0A> > > }=0A> > >= }=0A> > >=0A> > > > -----Urspr=FCngliche Nachricht-----=0A> > > > Von: O= tis Gospodnetic [mailto:otis_gospodnetic@yahoo.com]=0A> > > > Gesendet: = Dienstag, 3. Mai 2011 13:36=0A> > > > An: java-user@lucene.apache.org=0A> >= > > Betreff: Re: AW: "fuzzy prefix" search=0A> > > >=0A> > > > Hi,=0A> = > > >=0A> > > > I didn't read this thread closely, but just in case:=0A> = > > > * Is this something you can handle with synonyms?=0A> > > > * If th= is is for English and you are trying to handle typos, there is a =0A>list= =0A> > >of=0A> > > > common English misspellings out there that you coul= d use for this=0A> > perhaps.=0A> > > > * Have you considered n-gramming = your tokens? Not sure if this would=0A> > help,=0A> > > > didn't read me= ssages/examples closely enough, but you may want to=0A> > look at=0A> > > = > this if you haven't done so yet.=0A> > > >=0A> > > > Otis=0A> > > > ---= -=0A> > > > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Luc= ene=0A> > ecosystem=0A> > > > search :: http://search-lucene.com/=0A> > > = >=0A> > > >=0A> > > >=0A> > > > ----- Original Message ----=0A> > > > > = From: Clemens Wyss =0A> > > > > To: "java-user@luce= ne.apache.org" =0A> > > > > Sent: Tue, May= 3, 2011 5:25:30 AM=0A> > > > > Subject: AW: "fuzzy prefix" search=0A> > = > > >=0A> > > > > >PrefixQuery=0A> > > > > I'd like the combination of p= refix and fuzzy ;-) because people =0A>could=0A> > > > >also type "menlo= " or "m=E4rl" and in any of these cases I'd like to =0A>get=0A> > > > >a= hit on Merlot (for suggesting Merlot)=0A> > > > >=0A> > > > > > -----Ur= spr=FCngliche Nachricht-----=0A> > > > > > Von: Ian Lea [mailto:ian.lea@= gmail.com]=0A> > > > > > Gesendet: Dienstag, 3. Mai 2011 11:22=0A> > > >= > > An: java-user@lucene.apache.org=0A> > > > > > Betreff: Re: "fuzzy = prefix" search=0A> > > > > >=0A> > > > > > I'd assumed that FuzzyQuery = wouldn't ignore case but I could be=0A> > wrong.=0A> > > > > > What would= be the edit distance between "mer" and "merlot"?=0A> > Would=0A> > > > = > > it be less that 1.5 which I reckon would be the value of=0A> > > > >= > length(term)*0.5 as detailed in the javadocs? Seems unlikely, =0A>b= ut=0A> > > > > > I don't really know anything about the Levenshtein (edi= t =0Adistance)=0A> > > > algorithm as used by FuzzyQuery.=0A> > > > > > = Wouldn't a PrefixQuery be more appropriate here?=0A> > > > > >=0A> > > > = > >=0A> > > > > > --=0A> > > > > > Ian.=0A> > > > > >=0A> > > > > > On = Tue, May 3, 2011 at 10:10 AM, Clemens Wyss=0A> > > > > > =0A> > > > > > wrote:=0A> > > > > > > Unfortunately lowercasing d= oesn't help.=0A> > > > > > > Also, doesn't the FuzzyQuery ignore casing= ?=0A> > > > > > >=0A> > > > > > >> -----Urspr=FCngliche Nachricht-----= =0A> > > > > > >> Von: Ian Lea [mailto:ian.lea@gmail.com]=0A> > > > > > = >> Gesendet: Dienstag, 3. Mai 2011 11:06=0A> > > > > > >> An: java-user= @lucene.apache.org=0A> > > > > > >> Betreff: Re: "fuzzy prefix" search= =0A> > > > > > >>=0A> > > > > > >> Mer !=3D mer. The latter will be wh= at is indexed because=0A> > > > > > >> StandardAnalyzer calls LowerCaseF= ilter.=0A> > > > > > >>=0A> > > > > > >> --=0A> > > > > > >> Ian.=0A> > = > > > > >>=0A> > > > > > >>=0A> > > > > > >> On Tue, May 3, 2011 at 9:5= 6 AM, Clemens Wyss=0A> > > > > > =0A> > > > > > >= > wrote:=0A> > > > > > >> > Sorry for coming back to my issue. Can anybo= dy explain why =0A>my=0A> > > > > > "simple"=0A> > > > > > >> unit test= below fails? Any hint/help appreciated.=0A> > > > > > >> >=0A> > > > > = > >> > Directory directory =3D new RAMDirectory(); IndexWriter=0A> > > = > > > >> > indexWriter =3D new IndexWriter( directory, new=0A> > > > > = > >> > StandardAnalyzer(=0A> > > > > > Version.LUCENE_31=0A> > > > > > = >> > ), IndexWriter.MaxFieldLength.UNLIMITED ); Document=0A> > document= =0A> > > > =3D=0A> > > > > > new=0A> > > > > > >> > Document(); docume= nt.add( new Field( "test", "Merlot",=0A> > > > > > >> > Field.Store.YES, = Field.Index.ANALYZED ) );=0A> > > > > > >> > indexWriter.addDocument(=0A= > > > > > > >> > document ); IndexReader indexReader =3D=0A> > > > > > = indexWriter.getReader();=0A> > > > > > >> > IndexSearcher searcher =3D ne= w IndexSearcher( indexReader );=0A> > > > > > >> > Query q =3D new Fuzzy= Query( new Term( "test", "Mer" ), 0.5f, =0A>0,=0A> > > > > > >> > 10 ); = // or Query q =3D new FuzzyQuery( new Term( "test", =0A"Mer"=0A> > > > = > > >> > ), 0.5f); TopDocs result =3D searcher.search( q, 10 );=0A> > > = > > > >> > Assert.assertEquals( 1, result.totalHits );=0A> > > > > > >= > >=0A> > > > > > >> > - Clemens=0A> > > > > > >> >=0A> > > > > > >> >>= -----Urspr=FCngliche Nachricht-----=0A> > > > > > >> >> Von: Clemens W= yss [mailto:clemensdev@mysign.ch]=0A> > > > > > >> >> Gesendet: Montag, = 2. Mai 2011 23:01=0A> > > > > > >> >> An: java-user@lucene.apache.org=0A>= > > > > > >> >> Betreff: AW: "fuzzy prefix" search=0A> > > > > > >> = >>=0A> > > > > > >> >> Is it the combination of FuzzyQuery and Term whi= ch makes =0A>the=0A> > > > > > >> >> search to go for "word boundaries"?= =0A> > > > > > >> >>=0A> > > > > > >> >> > -----Urspr=FCngliche Nachri= cht-----=0A> > > > > > >> >> > Von: Clemens Wyss [mailto:clemensdev@mysi= gn.ch]=0A> > > > > > >> >> > Gesendet: Montag, 2. Mai 2011 14:13=0A> > = > > > > >> >> > An: java-user@lucene.apache.org=0A> > > > > > >> >> > = Betreff: AW: "fuzzy prefix" search=0A> > > > > > >> >> >=0A> > > > > > = >> >> > I tried this too, but unfortunately I only get hits when=0A> > = > > > > >> >> > the search term is a least as long as the word to be = =0A>looked=0A> > up.=0A> > > > > > >> >> >=0A> > > > > > >> >> > E.g.:= =0A> > > > > > >> >> > ...=0A> > > > > > >> >> > Directory directory = =3D new RAMDirectory(); IndexWriter=0A> > > > > > >> >> > indexWriter = =3D new IndexWriter( directory, >> >> >=0A> > > > > > IndexManager.getIn= dexingAnalyzer(=0A> > > > > > >> >> LOCALE_DE ),=0A> > > > > > >> >> >= IndexWriter.MaxFieldLength.UNLIMITED );=0A> > > > > > >> >= > >=0A> > > > > > >> >> > Document document =3D new Document(); documen= t.add(=0A> > new=0A> > > > > > Field(=0A> > > > > > >> >> > "test", "Mer= lot",=0A> > > > > > >> >> > Field.Store.YES, Field.Index.A= NALYZED ) );=0A> > > > > > >> >> indexWriter.addDocument(=0A> > > > > > = >> >> > document );=0A> > > > > > >> >> >=0A> > > > > > >> >> > Inde= xReader indexReader =3D indexWriter.getReader();=0A> > > > > > >> >> > In= dexSearcher=0A> > > > > > >> >> > searcher =3D new IndexSearcher( index= Reader ); >> >> >=0A> > > > > > >> >> > Query q =3D new FuzzyQuery( ne= w Term( "test", "Mer" ), =0A>0.6f,=0A> > > > > > >> >> > 1 ); TopDocs r= esult =3D searcher.search( q, 10 );=0A> > > > > > >> >> > Assert.assertEq= uals(=0A> > > > > > >> >> > 1,=0A> > > > > > >> >> result.totalHits ); = ...=0A> > > > > > >> >> >=0A> > > > > > >> >> > > -----Urspr=FCngliche= Nachricht-----=0A> > > > > > >> >> > > Von: Uwe Schindler [mailto:uwe@t= hetaphi.de]=0A> > > > > > >> >> > > Gesendet: Montag, 2. Mai 2011 13:50= =0A> > > > > > >> >> > > An: java-user@lucene.apache.org=0A> > > > > > = >> >> > > Betreff: RE: "fuzzy prefix" search=0A> > > > > > >> >> > >= =0A> > > > > > >> >> > > Hi,=0A> > > > > > >> >> > >=0A> > > > > > >> = >> > > You can pass an integer to FuzzyQuery which defines =0Athe=0A> > = > > > > >> >> > > number of characters that are seen as prefix. So all= =0A> > > > > > >> >> > > terms must match=0A> > > > > > >> >> > > this= prefix and the rest of each term is matched using=0A> > >fuzzy.=0A> > > >= > > >> >> > >=0A> > > > > > >> >> > > Uwe=0A> > > > > > >> >> > >=0A= > > > > > > >> >> > > -----=0A> > > > > > >> >> > > Uwe Schindler=0A> > = > > > > >> >> > > H.-H.-Meier-Allee 63, D-28213 Bremen=0A> > > > > > >= > http://www.thetaphi.de=0A> > > > > > >> >> > > eMail: uwe@thetaphi.de= =0A> > > > > > >> >> > >=0A> > > > > > >> >> > > > -----Original Messa= ge-----=0A> > > > > > >> >> > > > From: Clemens Wyss [mailto:clemensdev@= mysign.ch]=0A> > > > > > >> >> > > > Sent: Monday, May 02, 2011 1:47 PM= >> > > > To:=0A> > > > > > >> java-user@lucene.apache.org=0A> > > > >= > >> >> > > > Subject: "fuzzy prefix" search >> >> > > >=0A> > > > > = > >> >> > > > I'd like to search fuzzily but not on a full term.=0A> > = > > > > >> >> > > > E.g.=0A> > > > > > >> >> > > > I have a text "Merlo= t del Ticino"=0A> > > > > > >> >> > > > I'd like=0A> > > > > > >> >> = > > > "mer", "merr", "melo", ... to match.=0A> > > > > > >> >> > > >=0A= > > > > > > >> >> > > > If I use FuzzyQuery only "merlot, "merlott" hi= t. =0A>What=0A> > > > > > >> >> > > > Query-combination should I use?= =0A> > > > > > >> >> > > >=0A> > > > > > >> >> > > > Thx=0A> > > > > >= >> >> > > > Clemens=0A> > > > > > >> >> > > >=0A> > > > > > >> >> > = > >=0A> > > > > > >> >> > > >=0A> > > > > > >> >> > > > =0A>----------= ----------------------------------------------=0A> > > > > > >> >> > > > = ----=0A> > > > > > >> >> > > > ---=0A> > > > > > >> >> > > > ---=0A> >= > > > > >> >> > > > --=0A> > > > > > >> >> > > > - To unsubscribe, e-= mail:=0A> > > > > > >> >> > > > java-user-unsubscribe@lucene.apache.org= =0A> > > > > > >> >> > > > For additional commands, e-mail:=0A> > > > = > > >> >> > > > java-user-help@lucene.apache.org >> >> > >=0A> > > > >= > >> >> > >=0A> > > > > > >> >> > >=0A> > > > > > >> >> > >=0A> > > = > > > >> >> > > =0A>-----------------------------------------------------= -----=0A> > > > > > >> >> > > ----=0A> > > > > > >> >> > > ---=0A> > >= > > > >> >> > > ---=0A> > > > > > >> >> > > - To unsubscribe, e-mail:= =0A> > > > > > >> >> > > java-user-unsubscribe@lucene.apache.org=0A> > >= > > > >> >> > > For additional commands, e-mail:=0A> > > > > > >> >> = > > java-user-help@lucene.apache.org=0A> > > > > > >> >> >=0A> > > > > = > >> >> >=0A> > > > > > >> >> >=0A> > > > > > >> >> =0A>-------------= -------------------------------------------------=0A> > > > > > >> >> --= =0A> > > > > > >> >> > ---=0A> > > > > > >> >> > -- To unsubscribe, e= -mail:=0A> > > > > > >> >> > java-user-unsubscribe@lucene.apache.org=0A> >= > > > > >> >> > For additional commands, e-mail:=0A> > > > > > >> >>= > java-user-help@lucene.apache.org=0A> > > > > > >> >>=0A> > > > > > >= > >>=0A> > > > > > >> >>=0A> > > > > > >> >> =0A>----------------------= ----------------------------------------=0A> > > > > > >> >> ----=0A> > >= > > > >> >> --- To unsubscribe, e-mail:=0A> > > > > > >> >> java-user-= unsubscribe@lucene.apache.org=0A> > > > > > >> >> For additional command= s, e-mail:=0A> > > > > > java-user-help@lucene.apache.org >> >=0A> > >= > > > >> >=0A> > > > > > >> >=0A> > > > > > >> > =0A>-----------------= ----------------------------------------------=0A> > > > > > >> > ----=0A= > > > > > > >> > -- To unsubscribe, e-mail:=0A> > > > > > java-user-unsu= bscribe@lucene.apache.org=0A> > > > > > >> > For additional commands, e= -mail:=0A> > > > > > java-user-help@lucene.apache.org >> >=0A> > > > > = > >> >=0A> > > > > > >>=0A> > > > > > >>=0A> > > > > > >> =0A>--------= ---------------------------------------------------------=0A> > > > > > >= > ----=0A> > > > > > >> To unsubscribe, e-mail: java-user-=0A> > unsubscr= ibe@lucene.apache.org=0A> > > > > > >> For additional commands, e-mail:= =0A> > > > > > java-user-help@lucene.apache.org >=0A> > > > > > >=0A> > = > > > > >=0A> > > > > > > =0A>------------------------------------------= ------------------------=0A> > > > > > > ---=0A> > > > > > > To unsubsc= ribe, e-mail: java-user-unsubscribe@lucene.apache.org=0A> > > > > > > Fo= r additional commands, e-mail: java-user-=0A> > help@lucene.apache.org=0A>= > > > > > >=0A> > > > > > >=0A> > > > > >=0A> > > > > >=0A> > > > > > = =0A>--------------------------------------------------------------------= =0A> > > > > > - To unsubscribe, e-mail: java-user-unsubscribe@lucene.ap= ache.org=0A> > > > > > For additional commands, e-mail: java-user-help@l= ucene.apache.org=0A> > > > >=0A> > > > >=0A> > > > > =0A---------------= ------------------------------------------------------=0A> > > > > To un= subscribe, e-mail: java-user-unsubscribe@lucene.apache.org=0A> > > > > Fo= r additional commands, e-mail: java-user-help@lucene.apache.org=0A> > > >= >=0A> > > > >=0A> > > >=0A> > > > ------------------------------------= ---------------------------------=0A> > > > To unsubscribe, e-mail: java-= user-unsubscribe@lucene.apache.org=0A> > > > For additional commands, e-m= ail: java-user-help@lucene.apache.org=0A> > >=0A> > >=0A> > > -----------= ----------------------------------------------------------=0A> > > To uns= ubscribe, e-mail: java-user-unsubscribe@lucene.apache.org=0A> > > For add= itional commands, e-mail: java-user-help@lucene.apache.org=0A> > >=0A> > >= =0A> > =0A> > ------------------------------------------------------------= ---------=0A> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apach= e.org=0A> > For additional commands, e-mail: java-user-help@lucene.apache.= org=0A> =0A> =0A> ---------------------------------------------------------= ------------=0A> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apac= he.org=0A> For additional commands, e-mail: java-user-help@lucene.apache.o= rg=0A> =0A> --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org