lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis_gospodne...@yahoo.com>
Subject Re: AW: "fuzzy prefix" search
Date Tue, 03 May 2011 11:36:17 GMT
Hi,

I didn't read this thread closely, but just in case:
* Is this something you can handle with synonyms?
* If this is for English and you are trying to handle typos, there is a list of 
common English misspellings out there that you could use for this perhaps.
* Have you considered n-gramming your tokens?  Not sure if this would help, 
didn't read messages/examples closely enough, but you may want to look at this 
if you haven't done so yet.

Otis
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



----- Original Message ----
> From: Clemens Wyss <clemensdev@mysign.ch>
> To: "java-user@lucene.apache.org" <java-user@lucene.apache.org>
> Sent: Tue, May 3, 2011 5:25:30 AM
> Subject: AW: "fuzzy prefix" search
> 
> >PrefixQuery
> I'd like the combination of prefix and fuzzy ;-) because  people could also 
>type "menlo" or "märl" and in any of these cases I'd like to  get a hit on 
>Merlot (for suggesting Merlot)
> 
> > -----Ursprüngliche  Nachricht-----
> > Von: Ian Lea [mailto:ian.lea@gmail.com]
> > Gesendet:  Dienstag, 3. Mai 2011 11:22
> > An: java-user@lucene.apache.org
> >  Betreff: Re: "fuzzy prefix" search
> > 
> > I'd assumed that FuzzyQuery  wouldn't ignore case but I could be wrong.
> >  What would be the edit  distance between "mer" and "merlot"? Would it be
> > less that 1.5 which I  reckon would be the value of length(term)*0.5 as
> > detailed in the  javadocs?  Seems unlikely, but I don't really know anything
> > about  the Levenshtein (edit distance) algorithm as used by FuzzyQuery.
> >  Wouldn't a PrefixQuery be more appropriate here?
> > 
> > 
> >  --
> > Ian.
> > 
> > On Tue, May 3, 2011 at 10:10 AM, Clemens Wyss  <clemensdev@mysign.ch>
> >  wrote:
> > > Unfortunately lowercasing doesn't help.
> > > Also,  doesn't the FuzzyQuery ignore casing?
> > >
> > >>  -----Ursprüngliche Nachricht-----
> > >> Von: Ian Lea [mailto:ian.lea@gmail.com]
> > >>  Gesendet: Dienstag, 3. Mai 2011 11:06
> > >> An: java-user@lucene.apache.org
> >  >> Betreff: Re: "fuzzy prefix" search
> > >>
> > >>  Mer != mer.  The latter will be what is indexed because
> > >>  StandardAnalyzer calls LowerCaseFilter.
> > >>
> > >>  --
> > >> Ian.
> > >>
> > >>
> > >> On  Tue, May 3, 2011 at 9:56 AM, Clemens Wyss
> > <clemensdev@mysign.ch>
> > >>  wrote:
> > >> > Sorry for coming back to my issue. Can anybody  explain why my
> > "simple"
> > >> unit test below fails? Any  hint/help appreciated.
> > >> >
> > >> > Directory  directory = new RAMDirectory(); IndexWriter indexWriter
=
> > >> >  new IndexWriter( directory, new StandardAnalyzer(
> >  Version.LUCENE_31
> > >> > ), IndexWriter.MaxFieldLength.UNLIMITED  ); Document document =
> > new
> > >> > Document();  document.add( new Field( "test", "Merlot",
> > >> >  Field.Store.YES, Field.Index.ANALYZED ) ); indexWriter.addDocument(
> >  >> > document ); IndexReader indexReader =  indexWriter.getReader();
> > >> > IndexSearcher searcher = new  IndexSearcher( indexReader ); Query
q
> > >> > = new FuzzyQuery(  new Term( "test", "Mer" ), 0.5f, 0, 10 ); // or
> > >> > Query q =  new FuzzyQuery( new Term( "test", "Mer" ), 0.5f); TopDocs
> > >> >  result = searcher.search( q, 10 ); Assert.assertEquals( 1,
> > >> >  result.totalHits );
> > >> >
> > >> > -  Clemens
> > >> >
> > >> >> -----Ursprüngliche  Nachricht-----
> > >> >> Von: Clemens Wyss [mailto:clemensdev@mysign.ch]
> > >>  >> Gesendet: Montag, 2. Mai 2011 23:01
> > >> >> An: java-user@lucene.apache.org
> >  >> >> Betreff: AW: "fuzzy prefix" search
> > >>  >>
> > >> >> Is it the combination of FuzzyQuery and Term  which makes the
> > >> >> search to go for "word  boundaries"?
> > >> >>
> > >> >> >  -----Ursprüngliche Nachricht-----
> > >> >> > Von: Clemens  Wyss [mailto:clemensdev@mysign.ch]
> > >>  >> > Gesendet: Montag, 2. Mai 2011 14:13
> > >> >> >  An: java-user@lucene.apache.org
> >  >> >> > Betreff: AW: "fuzzy prefix" search
> > >>  >> >
> > >> >> > I tried this too, but unfortunately  I only get hits when
the
> > >> >> > search term is a least  as long as the word to be looked
up.
> > >> >> >
> >  >> >> > E.g.:
> > >> >> > ...
> > >>  >> > Directory directory = new RAMDirectory(); IndexWriter
> >  >> >> > indexWriter = new IndexWriter( directory,
> >  >> >> > IndexManager.getIndexingAnalyzer(
> > >>  >> LOCALE_DE ),
> > >> >> >              IndexWriter.MaxFieldLength.UNLIMITED );
> > >> >> >
> >  >> >> > Document document = new Document(); document.add( new  Field(
> > >> >> > "test", "Merlot",
> > >>  >> >             Field.Store.YES, Field.Index.ANALYZED ) );
> >  >> >> indexWriter.addDocument(
> > >> >> >  document );
> > >> >> >
> > >> >> >  IndexReader indexReader = indexWriter.getReader(); IndexSearcher
> >  >> >> > searcher = new IndexSearcher( indexReader );
> >  >> >> >
> > >> >> > Query q = new FuzzyQuery(  new Term( "test", "Mer" ), 0.6f,
1 );
> > >> >> > TopDocs  result = searcher.search( q, 10 ); Assert.assertEquals(
> > >>  >> > 1,
> > >> >> result.totalHits ); ...
> >  >> >> >
> > >> >> > > -----Ursprüngliche  Nachricht-----
> > >> >> > > Von: Uwe Schindler [mailto:uwe@thetaphi.de]
> > >> >>  > > Gesendet: Montag, 2. Mai 2011 13:50
> > >> >> >  > An: java-user@lucene.apache.org
> >  >> >> > > Betreff: RE: "fuzzy prefix" search
> > >>  >> > >
> > >> >> > > Hi,
> > >>  >> > >
> > >> >> > > You can pass an integer  to FuzzyQuery which defines
the number
> > >> >> > > of  characters that are seen as prefix. So all terms
must match
> > >>  >> > > this prefix and the rest of each term is matched using
 fuzzy.
> > >> >> > >
> > >> >> > >  Uwe
> > >> >> > >
> > >> >> > >  -----
> > >> >> > > Uwe Schindler
> > >>  >> > > H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de
> >  >> >> > > eMail: uwe@thetaphi.de
> > >> >> >  >
> > >> >> > > > -----Original Message-----
> >  >> >> > > > From: Clemens Wyss [mailto:clemensdev@mysign.ch]
> > >>  >> > > > Sent: Monday, May 02, 2011 1:47 PM
> > >>  >> > > > To: java-user@lucene.apache.org
> >  >> >> > > > Subject: "fuzzy prefix" search
> >  >> >> > > >
> > >> >> > > > I'd  like to search fuzzily but not on a full term.
> > >> >> >  > > E.g.
> > >> >> > > > I have a text "Merlot  del Ticino"
> > >> >> > > > I'd like
> > >>  >> > > > "mer", "merr", "melo", ... to match.
> > >>  >> > > >
> > >> >> > > > If I use  FuzzyQuery only "merlot,  "merlott" hit.
What
> > >> >> >  > > Query-combination should I use?
> > >> >> > >  >
> > >> >> > > > Thx
> > >> >> >  > > Clemens
> > >> >> > > >
> > >>  >> > > >
> > >> >> > > >  ------------------------------------------------------------
> > >>  >> > > > ---
> > >> >> > > > ---
> >  >> >> > > > --
> > >> >> > > > -  To unsubscribe, e-mail:
> > >> >> > > > java-user-unsubscribe@lucene.apache.org
> >  >> >> > > > For additional commands, e-mail:
> >  >> >> > > > java-user-help@lucene.apache.org
> >  >> >> > >
> > >> >> > >
> >  >> >> > >
> > >> >> > >  --------------------------------------------------------------
> > >>  >> > > ---
> > >> >> > > ---
> > >>  >> > > - To unsubscribe, e-mail:
> > >> >> > > java-user-unsubscribe@lucene.apache.org
> >  >> >> > > For additional commands, e-mail:
> > >>  >> > > java-user-help@lucene.apache.org
> >  >> >> >
> > >> >> >
> > >> >>  > ----------------------------------------------------------------
> >  >> >> > ---
> > >> >> > -- To unsubscribe,  e-mail:
> > >> >> > java-user-unsubscribe@lucene.apache.org
> >  >> >> > For additional commands, e-mail:
> > >>  >> > java-user-help@lucene.apache.org
> >  >> >>
> > >> >>
> > >> >>  ------------------------------------------------------------------
> >  >> >> --- To unsubscribe, e-mail:
> > >> >> java-user-unsubscribe@lucene.apache.org
> >  >> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >  >> >
> > >> >
> > >> >  -------------------------------------------------------------------
> >  >> > -- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >  >> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >  >> >
> > >> >
> > >>
> > >>  ---------------------------------------------------------------------
> >  >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >  >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >  >
> > >
> > >  ---------------------------------------------------------------------
> >  > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >  > For additional commands, e-mail: java-user-help@lucene.apache.org
> >  >
> > >
> > 
> >  ---------------------------------------------------------------------
> > To  unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >  For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 
> ---------------------------------------------------------------------
> To  unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For  additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message