lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <...@thetaphi.de>
Subject RE: Does Fuzzy Search scores the same as Exact Match
Date Sat, 28 Jan 2012 11:22:37 GMT
> > >> -----Original Message-----
> > >> From: Paul Taylor [mailto:paul_t100@fastmail.fm]
> > >> Sent: Saturday, January 28, 2012 10:33 AM
> > >> To: 'java-user@lucene.apache.org'
> > >> Subject: Does Fuzzy Search scores the same as Exact Match
> > >>
> > >> All things being equal does a fuzzy match give the same score as an
> > >> exact match.
> > >> i.e if I do a search for farmin and it matches two docs one on term
> > > farmin, the
> > >> other on term farming, will it score farming higher or score both
> > >> the same
> > > ?
> > >
> > > YES, depends on the Fuzzy configuration (rewrite method,...), but
> > > the default does so!
> > >
> > > Uwe
> > >
> > >
> > So how do I change it, seems like a funny default to have.
> 
> Maybe I was not clear, it should score "farming" higher than "farmin" by
> default, but the default rewrite mode also takes TF/IDF into account (in
> addition).

Maybe there was some confusion in your original question, to make it clear:
If you search for "farming", "farming" (exact match) should score higher
than "farmin" (distance 1). With default rewrite mode this is correct for
boosting, but if a typo is more unlikely in the corpus, then based on TF-IDF
the score can still be different. You can prohibit that by using the right
rewrite mode that *only* takes levensthein distance as inverse boost and not
use TF-IDF => http://goo.gl/0eJ47

> You can change that by a different rewrite method:
> 
> The default is: http://goo.gl/JhHOA (which combines the standard vector
model
> with additionally boosting exact matches - we have that for backwards
> compatibility only, its not what most users expect)
> 
> The better one is: http://goo.gl/0eJ47, which does not take TF/IDF into
account
> and only boosts by levensthein distance.
> 
> You can disable fuzzy boosting altogether:
> Additionally http://goo.gl/VWlkW provides two other scoring models (TF/IDF
> only, no boosting - or constant score at all)
> 
> Uwe
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message