lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From László Monda <l...@monda.hu>
Subject Re: Getting irrelevant results using fuzzy query
Date Mon, 23 Jun 2008 12:11:50 GMT
Thanks for your reply, Mark.



This was my original code for constructing my query using FuzzyQuery:

BooleanQuery query = new BooleanQuery();
if (artist.length() > 0) {
    FuzzyQuery artist_query = new FuzzyQuery(new Term("artist",
artist));
    query.add(artist_query, BooleanClause.Occur.MUST);
}
if (song.length() > 0) {
    FuzzyQuery song_query = new FuzzyQuery(new Term("song", song));
    query.add(song_query, BooleanClause.Occur.MUST);
}



This is my first attempt to use FuzzyLikeThisQuery (with no success):

FuzzyLikeThisQuery query = new FuzzyLikeThisQuery(2, new
SimpleAnalyzer());
if (artist.length() > 0) {
    query.addTerms(artist, "artist", 0.5f, 0);
}
if (song.length() > 0) {
    query.addTerms(song, "song", 0.5f, 0);
}



This is my second attempt to use FuzzyLikeThisQuery (with no success):

BooleanQuery query = new BooleanQuery();
if (artist.length() > 0) {
    FuzzyLikeThisQuery artist_query = new FuzzyLikeThisQuery(1, new
SimpleAnalyzer());
    artist_query.addTerms(artist, "artist", 0.5f, 0);
    query.add(artist_query, BooleanClause.Occur.MUST);
}
if (song.length() > 0) {
    FuzzyLikeThisQuery song_query = new FuzzyLikeThisQuery(1, new
SimpleAnalyzer());
    song_query.addTerms(song, "song", 0.5f, 0);
    query.add(song_query, BooleanClause.Occur.MUST);
}



I think it's my lack of undersanding of the usage of FuzzyLikeThisQuery
that makes me getting irrelevant results.

Could you tell me what's wrong here, please?

Thank you.

On Mon, 2008-06-23 at 11:28 +0000, mark harwood wrote:
> >>I do have serious problems with the relevance of the results with fuzzy queries.
> 
> Please take the time to read my response here:
> 
>      http://www.gossamer-threads.com/lists/lucene/java-user/62050#62050
> 
> I had a work colleague come up with exactly the same problem this week and the solution
is the same.
> 
> Just tested my index with a standard Lucene FuzzyQuery for "Paul~" - this gives "Phul",
"Saul", and "Paulo" before ANY "Paul" records due to IDF issues.
> Using FuzzyLikeThisQuery puts all the "Paul" records ahead of the variants.
> 
> 
> 
> ----- Original Message ----
> From: László Monda <laci@monda.hu>
> To: java-user@lucene.apache.org
> Cc: lucenelist2007@danielnaber.de
> Sent: Monday, 23 June, 2008 12:10:05 PM
> Subject: Re: Getting irrelevant results using fuzzy query
> 
> On Wed, 2008-06-18 at 21:10 +0200, Daniel Naber wrote:
> > On Mittwoch, 18. Juni 2008, László Monda wrote:
> > 
> > > Additional info: Lucene seems to do the right thing when only few
> > > documents are present, but goes crazy when there is about 1.5 million
> > > documents in the index.
> > 
> > Lucene works well with more documents (currently using it with 9 million). 
> > but the fuzzy query requires iteration over all terms which makes this 
> > query slow. This can be avoid by setting the prefixLength parameter of the 
> > FuzzyQuery constructor to 1 or 2. Or maybe you should use an n-gram index, 
> > see the spellchecker in the contrib area.
> 
> Thanks for the suggestion, but I don't have any performance problems
> yet, but I do have serious problems with the relevance of the results
> with fuzzy queries.
> 
-- 
Laci  <http://monda.hu>


Mime
View raw message