lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From stefcl <stefatw...@gmail.com>
Subject RE: Strange Fuzzyquery results scoring when using a low minimal distance
Date Tue, 23 Feb 2010 13:00:15 GMT

Thanks for the very detailed answer.
Using fuzzylikethis solves the problem.



Uwe Schindler wrote:
> 
> The problem ist he following:
> The docFreq of the term "lucéne" is 2, all other terms have 1 (because
> StandardAnalyzer lowercases everything). What happens is, that terms with
> lower docFreq get a higher score in TermQuery. This score overweighs the
> boosting done by FuzzyQuery (because you index is so small).
> 
> If you raise the minSimilarity a little bit, your query matches less terms
> and the rewritten BooleanQuery contains less clauses. At some point the
> score overweigh of the less frequent terms is no longer relevant for the
> final score. 
> 
> By the way, you can always look at the explain() results which informs you
> about the scoring done.
> 
> The fix is (applies only to trunk, see issue
> https://issues.apache.org/jira/browse/LUCENE-124) to ignore scoring of the
> TermQueries generated by Fuzzy and only look at the edit distance
> (implemented by another MTQ.RewriteMode), that can be set with
> FuzzyQuery.setRewriteMode().
> 
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
> 
> 
>> -----Original Message-----
>> From: stefcl [mailto:stefatwork@gmail.com]
>> Sent: Tuesday, February 16, 2010 10:11 AM
>> To: java-user@lucene.apache.org
>> Subject: Re: Strange Fuzzyquery results scoring when using a low
>> minimal distance
>> 
>> 
>> Thanksa lot,
>> But I still don't understand why raising a little bit the min
>> similarity
>> change the ordering...
>> 
>> 
>> 
>> markharw00d wrote:
>> >
>> > This could be down to IDF ie "Lucane" is ranked higher because it is
>> rarer
>> > despite having worse edit distance.
>> > This is arguably a bug.
>> > See http://issues.apache.org/jira/browse/LUCENE-329 which discusses
>> this.
>> > You could try subclass QueryParser and override newFuzzyQuery to
>> return
>> > FuzzyLikeThisQuery (found in "contrib/queries")
>> >
>> > Cheers
>> > Mark
>> >
>> >
>> >
>> > ----- Original Message ----
>> > From: stefcl <stefatwork@gmail.com>
>> > To: java-user@lucene.apache.org
>> > Sent: Mon, 15 February, 2010 14:13:52
>> > Subject: Strange Fuzzyquery results scoring when using a low minimal
>> > distance
>> >
>> >
>> > Hello,
>> >
>> > I'm using Lucene v3.
>> > Please consider the following spellings
>> >
>> > Lucene
>> > Lucéne
>> > lucéne
>> > Lucane
>> > Lucen
>> >
>> > When searching for "lucéne" among those words using a FuzzyQuery
>> (with 0.5
>> > edit distance), results show :
>> >
>> > 1. Lucene 1.0259752
>> > 2. Lucane 1.0259752
>> > 3. Lucéne 0.95660806
>> > 4. lucéne 0.95660806
>> > 5. Lucen 0.30779266
>> >
>> > #4 is an exact match, why does it receive a lower score than "Lucane"
>> > which
>> > contains one incorrect letter?
>> >
>> > Also, if you raise min similarity a bit higher (0.6 of above),
>> everything
>> > becomes normal :
>> >
>> > 1. Lucéne 1.0438477
>> > 2. lucéne 1.0438477
>> > 3. Lucene 0.97959816
>> > 4. Lucane 0.97959816
>> >
>> >
>> > Any idea?
>> > Thanks in advance...
>> >
>> >
>> > The code I use :
>> >
>> >    /**
>> >      * @param args the command line arguments
>> >      */
>> >     public static void main(String[] args) throws IOException,
>> > ParseException
>> >     {
>> >
>> >         StandardAnalyzer analyzer = new
>> > StandardAnalyzer(Version.LUCENE_CURRENT);
>> >
>> >         // TODO code application logic here
>> >         Directory index = new RAMDirectory();
>> >         IndexWriter w = new IndexWriter(index, analyzer, true,
>> > IndexWriter.MaxFieldLength.UNLIMITED);
>> >
>> >         addDoc(w, "Lucene");
>> >         addDoc(w, "Lucéne");
>> >         addDoc(w, "lucéne");
>> >         addDoc(w, "Lucane");
>> >         addDoc(w, "Lucen");
>> >
>> >         w.close();
>> >
>> >         FuzzyQuery q =  new FuzzyQuery( new Term("title", "lucéne") ,
>> 0.5f
>> > );
>> >
>> >         // 3. search
>> >         IndexSearcher searcher = new IndexSearcher(index);
>> >
>> >         TopDocs collector = searcher.search(q, 10);
>> >         ScoreDoc[] hits = collector.scoreDocs;
>> >
>> >         // 4. display results
>> >         System.out.println("Found " + hits.length + " hits.");
>> >         for(int i = 0 ; i < hits.length; i++)
>> >         {
>> >               Document d = searcher.doc(hits[i].doc);
>> >               System.out.println((i + 1) + ". " + d.get("title") + "
>> " +
>> > hits[i].score );
>> >         }
>> >
>> >         // searcher can only be closed when there
>> >         // is no need to access the documents any more.
>> >         searcher.close();
>> >     }
>> >
>> >
>> >     private static void addDoc(IndexWriter w, String value) throws
>> > IOException
>> >     {
>> >         Document doc = new Document();
>> >         doc.add(new Field("title", value, Field.Store.YES,
>> > Field.Index.ANALYZED));
>> >         w.addDocument(doc);
>> >     }
>> > --
>> > View this message in context:
>> > http://old.nabble.com/Strange-Fuzzyquery-results-scoring-when-using-
>> a-low-minimal-distance-tp27594371p27594371.html
>> > Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>> >
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> > For additional commands, e-mail: java-user-help@lucene.apache.org
>> >
>> >
>> >
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> > For additional commands, e-mail: java-user-help@lucene.apache.org
>> >
>> >
>> >
>> 
>> --
>> View this message in context: http://old.nabble.com/Strange-Fuzzyquery-
>> results-scoring-when-using-a-low-minimal-distance-
>> tp27594371p27605395.html
>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 
> 

-- 
View this message in context: http://old.nabble.com/Strange-Fuzzyquery-results-scoring-when-using-a-low-minimal-distance-tp27594371p27702921.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message