lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis_gospodne...@yahoo.com>
Subject Re: AW: AW: "fuzzy prefix" search
Date Tue, 03 May 2011 19:31:12 GMT
Clemens,

Something a la:

public TokenStream tokenStream (String fieldName, Reader r) {
  return nw EdgeNGramTokenFilter(new KeywordTokenizer(r), 
EdgeNGramTokenFilter.Side.FRONT, 1, 4);
}


Check out page 265 of Lucene in Action 2.

Otis
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



----- Original Message ----
> From: Clemens Wyss <clemensdev@mysign.ch>
> To: "java-user@lucene.apache.org" <java-user@lucene.apache.org>
> Sent: Tue, May 3, 2011 12:57:39 PM
> Subject: AW: AW: "fuzzy prefix" search
> 
> How does an simple Analyzer look that just "n-grams" the  docs/fields.
> 
> class SimpleNGramAnalyzer extends  Analyzer
> {
> @Override
> public TokenStream tokenStream ( String fieldName,  Reader reader )
> {
>    EdgeNGramTokenFilter...  ???
> }
> }
> 
> > -----Ursprüngliche Nachricht-----
> > Von: Otis  Gospodnetic [mailto:otis_gospodnetic@yahoo.com]
> >  Gesendet: Dienstag, 3. Mai 2011 13:36
> > An: java-user@lucene.apache.org
> >  Betreff: Re: AW: "fuzzy prefix" search
> > 
> > Hi,
> > 
> > I  didn't read this thread closely, but just in case:
> > * Is this something  you can handle with synonyms?
> > * If this is for English and you are  trying to handle typos, there is a list 
>of
> > common English misspellings  out there that you could use for this perhaps.
> > * Have you considered  n-gramming your tokens?  Not sure if this would help,
> > didn't read  messages/examples closely enough, but you may want to look at
> > this if  you haven't done so yet.
> > 
> > Otis
> > ----
> > Sematext :: http://sematext.com/ :: Solr -  Lucene - Nutch Lucene ecosystem
> > search :: http://search-lucene.com/
> > 
> > 
> > 
> > ----- Original  Message ----
> > > From: Clemens Wyss <clemensdev@mysign.ch>
> > > To:  "java-user@lucene.apache.org"  <java-user@lucene.apache.org>
> >  > Sent: Tue, May 3, 2011 5:25:30 AM
> > > Subject: AW: "fuzzy prefix"  search
> > >
> > > >PrefixQuery
> > > I'd like the  combination of prefix and fuzzy ;-) because  people could
> > >also  type "menlo" or "märl" and in any of these cases I'd like to  get
> >  >a hit on Merlot (for suggesting Merlot)
> > >
> > > >  -----Ursprüngliche  Nachricht-----
> > > > Von: Ian Lea  [mailto:ian.lea@gmail.com]
> > > >  Gesendet:  Dienstag, 3. Mai 2011 11:22
> > > > An: java-user@lucene.apache.org
> >  > >  Betreff: Re: "fuzzy prefix" search
> > > >
> >  > > I'd assumed that FuzzyQuery  wouldn't ignore case but I could be  
wrong.
> > > >  What would be the edit  distance between  "mer" and "merlot"? Would
> > > > it be less that 1.5 which I   reckon would be the value of
> > > > length(term)*0.5 as detailed in  the  javadocs?  Seems unlikely, but
> > > > I don't really  know anything about  the Levenshtein (edit distance)
> > algorithm as  used by FuzzyQuery.
> > > >  Wouldn't a PrefixQuery be more  appropriate here?
> > > >
> > > >
> > > >   --
> > > > Ian.
> > > >
> > > > On Tue, May 3,  2011 at 10:10 AM, Clemens Wyss
> > > > <clemensdev@mysign.ch>
> > >  >  wrote:
> > > > > Unfortunately lowercasing doesn't  help.
> > > > > Also,  doesn't the FuzzyQuery ignore  casing?
> > > > >
> > > > >>   -----Ursprüngliche Nachricht-----
> > > > >> Von: Ian Lea  [mailto:ian.lea@gmail.com]
> > > >  >>  Gesendet: Dienstag, 3. Mai 2011 11:06
> > > > >>  An: java-user@lucene.apache.org
> >  > >  >> Betreff: Re: "fuzzy prefix" search
> > > >  >>
> > > > >>  Mer != mer.  The latter will be  what is indexed because
> > > > >> StandardAnalyzer calls  LowerCaseFilter.
> > > > >>
> > > > >>   --
> > > > >> Ian.
> > > > >>
> > > >  >>
> > > > >> On  Tue, May 3, 2011 at 9:56 AM,  Clemens Wyss
> > > > <clemensdev@mysign.ch>
> > >  > >>  wrote:
> > > > >> > Sorry for coming back  to my issue. Can anybody  explain
why my
> > > > "simple"
> >  > > >> unit test below fails? Any  hint/help  appreciated.
> > > > >> >
> > > > >> >  Directory  directory = new RAMDirectory(); IndexWriter
> > > >  >> > indexWriter =  new IndexWriter( directory, new
> > >  > >> > StandardAnalyzer(
> > > >   Version.LUCENE_31
> > > > >> > ),  IndexWriter.MaxFieldLength.UNLIMITED  ); Document document
> >  =
> > > > new
> > > > >> > Document();   document.add( new Field( "test", "Merlot",
> > > > >> >  Field.Store.YES, Field.Index.ANALYZED ) );
> > > > >> >  indexWriter.addDocument(
> > > >  >> > document );  IndexReader indexReader =
> > > > indexWriter.getReader();
> >  > > >> > IndexSearcher searcher = new  IndexSearcher(  indexReader
);
> > > > >> > Query q = new FuzzyQuery(   new Term( "test", "Mer" ), 0.5f,
0,
> > > > >> > 10 ); // or  Query q =  new FuzzyQuery( new Term( "test",
"Mer"
> > > >  >> > ), 0.5f); TopDocs  result = searcher.search( q, 10 );
> >  > > >> > Assert.assertEquals( 1,  result.totalHits  );
> > > > >> >
> > > > >> > -   Clemens
> > > > >> >
> > > > >> >>  -----Ursprüngliche  Nachricht-----
> > > > >> >> Von:  Clemens Wyss [mailto:clemensdev@mysign.ch]
> > > >  >>  >> Gesendet: Montag, 2. Mai 2011 23:01
> > > >  >> >> An: java-user@lucene.apache.org
> >  > >  >> >> Betreff: AW: "fuzzy prefix" search
> >  > > >>  >>
> > > > >> >> Is it the  combination of FuzzyQuery and Term  which
makes the
> > > >  >> >> search to go for "word  boundaries"?
> > > >  >> >>
> > > > >> >> >   -----Ursprüngliche Nachricht-----
> > > > >> >> > Von:  Clemens  Wyss [mailto:clemensdev@mysign.ch]
> > > >  >>  >> > Gesendet: Montag, 2. Mai 2011 14:13
> > >  > >> >> >  An: java-user@lucene.apache.org
> >  > >  >> >> > Betreff: AW: "fuzzy prefix"  search
> > > > >>  >> >
> > > > >>  >> > I tried this too, but unfortunately  I only get
hits  when
> > > > >> >> > the search term is a least   as long as the word
to be looked 
up.
> > > > >> >>  >
> > > >  >> >> > E.g.:
> > > >  >> >> > ...
> > > > >>  >> >  Directory directory = new RAMDirectory(); IndexWriter
> > > >   >> >> > indexWriter = new IndexWriter( directory,  >>
 >> >
> > > > IndexManager.getIndexingAnalyzer(
> > >  > >>  >> LOCALE_DE ),
> > > > >> >>  >               IndexWriter.MaxFieldLength.UNLIMITED
);
> > > > >> >>  >
> > > >  >> >> > Document document = new  Document(); document.add(
new
> > > > Field(
> > > >  >> >> > "test", "Merlot",
> > > > >>   >> >             Field.Store.YES,  Field.Index.ANALYZED
) );
> > > >  >> >>  indexWriter.addDocument(
> > > > >> >> >  document  );
> > > > >> >> >
> > > > >> >>  >  IndexReader indexReader = indexWriter.getReader();
> > > >  >> >> > IndexSearcher
> > > >  >> >>  > searcher = new IndexSearcher( indexReader );
 >> >>  >
> > > > >> >> > Query q = new FuzzyQuery(   new Term( "test", "Mer"
), 0.6f,
> > > > >> >> > 1 );  TopDocs  result = searcher.search( q, 10
);
> > > > >>  >> > Assert.assertEquals(
> > > > >>  >>  > 1,
> > > > >> >> result.totalHits ); ...
> > >  >  >> >> >
> > > > >> >> > >  -----Ursprüngliche  Nachricht-----
> > > > >> >> >  > Von: Uwe Schindler [mailto:uwe@thetaphi.de]
> > > > >>  >>  > > Gesendet: Montag, 2. Mai 2011 13:50
> > > >  >> >> >  > An: java-user@lucene.apache.org
> >  > >  >> >> > > Betreff: RE: "fuzzy prefix"  search
> > > > >>  >> > >
> > > >  >> >> > > Hi,
> > > > >>  >> >  >
> > > > >> >> > > You can pass an integer   to FuzzyQuery which
defines the
> > > > >> >> > >  number of  characters that are seen as prefix.
So all
> > > >  >> >> > > terms must match
> > > > >>   >> > > this prefix and the rest of each term is
matched using   
>fuzzy.
> > > > >> >> > >
> > > > >>  >> > >  Uwe
> > > > >> >> >  >
> > > > >> >> > >  -----
> > > >  >> >> > > Uwe Schindler
> > > > >>   >> > > H.-H.-Meier-Allee 63, D-28213 Bremen
> > > >  >> http://www.thetaphi.de
> > > >  >> >> > >  eMail: uwe@thetaphi.de
> > > > >>  >> >  >
> > > > >> >> > > >  -----Original Message-----
> > > >  >> >> > >  > From: Clemens Wyss [mailto:clemensdev@mysign.ch]
> > > >  >>  >> > > > Sent: Monday, May 02, 2011 1:47 PM
  >> > > > To:
> > > > >> java-user@lucene.apache.org
> >  > >  >> >> > > > Subject: "fuzzy prefix"  search  >>
>> > > >
> > > > >>  >> > > > I'd  like to search fuzzily but not
on a full  term.
> > > > >> >> >  > > E.g.
> > >  > >> >> > > > I have a text "Merlot  del  Ticino"
> > > > >> >> > > > I'd like
> > >  > >>  >> > > > "mer", "merr", "melo", ... to  match.
> > > > >>  >> > > >
> > >  > >> >> > > > If I use  FuzzyQuery only  "merlot,
 "merlott" hit. What
> > > > >> >>  >  > > Query-combination should I use?
> > > > >>  >> > >  >
> > > > >> >> > > >  Thx
> > > > >> >> >  > > Clemens
> >  > > >> >> > > >
> > > > >>   >> > > >
> > > > >> >> > >  >
> > > > >> >> > > >  --------------------------------------------------------
> > > >  >> >> > > > ----
> > > > >>  >>  > > > ---
> > > > >> >> > > >  ---
> > > >  >> >> > > > --
> > >  > >> >> > > > -  To unsubscribe, e-mail:
> >  > > >> >> > > > java-user-unsubscribe@lucene.apache.org
> >  > >  >> >> > > > For additional commands,  e-mail:
> > > >  >> >> > > > java-user-help@lucene.apache.org   >>
>> > >
> > > > >> >> >  >
> > > >  >> >> > >
> > > >  >> >> > >
> > > > >> >> > >  ----------------------------------------------------------
> > > >  >> >> > > ----
> > > > >>  >> >  > ---
> > > > >> >> > > ---
> > > >  >>  >> > > - To unsubscribe, e-mail:
> > > >  >> >> > > java-user-unsubscribe@lucene.apache.org
> >  > >  >> >> > > For additional commands,  e-mail:
> > > > >>  >> > > java-user-help@lucene.apache.org
> >  > >  >> >> >
> > > > >> >>  >
> > > > >> >>  >
> > > > >>  >> --------------------------------------------------------------
> >  > > >> >> --
> > > >  >> >> >  ---
> > > > >> >> > -- To unsubscribe,   e-mail:
> > > > >> >> > java-user-unsubscribe@lucene.apache.org
> >  > >  >> >> > For additional commands, e-mail:
> >  > > >>  >> > java-user-help@lucene.apache.org
> >  > >  >> >>
> > > > >> >>
> >  > > >> >>
> > > > >> >>  --------------------------------------------------------------
> > > >  >> >> ----
> > > >  >> >> --- To  unsubscribe, e-mail:
> > > > >> >> java-user-unsubscribe@lucene.apache.org
> >  > >  >> >> For additional commands, e-mail:
> > >  > java-user-help@lucene.apache.org   >> >
> > > > >> >
> > > > >>  >
> > > > >> >  ---------------------------------------------------------------
> > >  > >> > ----
> > > >  >> > -- To unsubscribe,  e-mail:
> > > > java-user-unsubscribe@lucene.apache.org
> >  > >  >> > For additional commands, e-mail:
> > >  > java-user-help@lucene.apache.org   >> >
> > > > >> >
> > > >  >>
> > > > >>
> > > > >>  -----------------------------------------------------------------
> > >  > >> ----
> > > >  >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >  > >  >> For additional commands, e-mail:
> > > > java-user-help@lucene.apache.org   >
> > > > >
> > > > >
> > > > >  ------------------------------------------------------------------
> > >  > > ---
> > > >  > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >  > >  > For additional commands, e-mail: java-user-help@lucene.apache.org
> >  > > >
> > > > >
> > > >
> > >  >
> > > >  --------------------------------------------------------------------
> >  > > - To  unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >  > >  For additional commands, e-mail: java-user-help@lucene.apache.org
> >  >
> > >
> > >  ---------------------------------------------------------------------
> >  > To  unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >  > For  additional commands, e-mail: java-user-help@lucene.apache.org
> >  >
> > >
> > 
> >  ---------------------------------------------------------------------
> > To  unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >  For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 
> ---------------------------------------------------------------------
> To  unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For  additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message