lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From stefcl <stefatw...@gmail.com>
Subject Re: Strange Fuzzyquery results scoring when using a low minimal distance
Date Tue, 16 Feb 2010 09:11:06 GMT

Thanksa lot,
But I still don't understand why raising a little bit the min similarity
change the ordering...



markharw00d wrote:
> 
> This could be down to IDF ie "Lucane" is ranked higher because it is rarer
> despite having worse edit distance.
> This is arguably a bug.
> See http://issues.apache.org/jira/browse/LUCENE-329 which discusses this.
> You could try subclass QueryParser and override newFuzzyQuery to return
> FuzzyLikeThisQuery (found in "contrib/queries")
> 
> Cheers
> Mark
> 
> 
> 
> ----- Original Message ----
> From: stefcl <stefatwork@gmail.com>
> To: java-user@lucene.apache.org
> Sent: Mon, 15 February, 2010 14:13:52
> Subject: Strange Fuzzyquery results scoring when using a low minimal
> distance
> 
> 
> Hello,
> 
> I'm using Lucene v3. 
> Please consider the following spellings 
> 
> Lucene
> Lucéne
> lucéne
> Lucane
> Lucen
> 
> When searching for "lucéne" among those words using a FuzzyQuery (with 0.5
> edit distance), results show :
> 
> 1. Lucene 1.0259752
> 2. Lucane 1.0259752
> 3. Lucéne 0.95660806
> 4. lucéne 0.95660806
> 5. Lucen 0.30779266
> 
> #4 is an exact match, why does it receive a lower score than "Lucane"
> which
> contains one incorrect letter?
> 
> Also, if you raise min similarity a bit higher (0.6 of above), everything
> becomes normal :
> 
> 1. Lucéne 1.0438477
> 2. lucéne 1.0438477
> 3. Lucene 0.97959816
> 4. Lucane 0.97959816
> 
> 
> Any idea?
> Thanks in advance...
> 
> 
> The code I use :
> 
>    /**
>      * @param args the command line arguments
>      */
>     public static void main(String[] args) throws IOException,
> ParseException
>     {
> 
>         StandardAnalyzer analyzer = new
> StandardAnalyzer(Version.LUCENE_CURRENT);
> 
>         // TODO code application logic here
>         Directory index = new RAMDirectory();
>         IndexWriter w = new IndexWriter(index, analyzer, true,
> IndexWriter.MaxFieldLength.UNLIMITED);
> 
>         addDoc(w, "Lucene");
>         addDoc(w, "Lucéne");
>         addDoc(w, "lucéne");
>         addDoc(w, "Lucane");
>         addDoc(w, "Lucen");
> 
>         w.close();
> 
>         FuzzyQuery q =  new FuzzyQuery( new Term("title", "lucéne") , 0.5f
> );
>         
>         // 3. search
>         IndexSearcher searcher = new IndexSearcher(index);
>         
>         TopDocs collector = searcher.search(q, 10);
>         ScoreDoc[] hits = collector.scoreDocs;
> 
>         // 4. display results
>         System.out.println("Found " + hits.length + " hits.");
>         for(int i = 0 ; i < hits.length; i++)
>         {
>               Document d = searcher.doc(hits[i].doc);
>               System.out.println((i + 1) + ". " + d.get("title") + " " + 
> hits[i].score );
>         }
> 
>         // searcher can only be closed when there
>         // is no need to access the documents any more.
>         searcher.close();
>     }
> 
> 
>     private static void addDoc(IndexWriter w, String value) throws
> IOException
>     {
>         Document doc = new Document();
>         doc.add(new Field("title", value, Field.Store.YES,
> Field.Index.ANALYZED));
>         w.addDocument(doc);
>     }
> -- 
> View this message in context:
> http://old.nabble.com/Strange-Fuzzyquery-results-scoring-when-using-a-low-minimal-distance-tp27594371p27594371.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 
> 

-- 
View this message in context: http://old.nabble.com/Strange-Fuzzyquery-results-scoring-when-using-a-low-minimal-distance-tp27594371p27605395.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message