lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From stefcl <stefatw...@gmail.com>
Subject Strange Fuzzyquery results scoring when using a low minimal distance
Date Mon, 15 Feb 2010 14:13:52 GMT

Hello,

I'm using Lucene v3. 
Please consider the following spellings 

Lucene
Lucéne
lucéne
Lucane
Lucen

When searching for "lucéne" among those words using a FuzzyQuery (with 0.5
edit distance), results show :

1. Lucene 1.0259752
2. Lucane 1.0259752
3. Lucéne 0.95660806
4. lucéne 0.95660806
5. Lucen 0.30779266

#4 is an exact match, why does it receive a lower score than "Lucane" which
contains one incorrect letter?

Also, if you raise min similarity a bit higher (0.6 of above), everything
becomes normal :

1. Lucéne 1.0438477
2. lucéne 1.0438477
3. Lucene 0.97959816
4. Lucane 0.97959816


Any idea?
Thanks in advance...


The code I use :

   /**
     * @param args the command line arguments
     */
    public static void main(String[] args) throws IOException,
ParseException
    {

        StandardAnalyzer analyzer = new
StandardAnalyzer(Version.LUCENE_CURRENT);

        // TODO code application logic here
        Directory index = new RAMDirectory();
        IndexWriter w = new IndexWriter(index, analyzer, true,
IndexWriter.MaxFieldLength.UNLIMITED);

        addDoc(w, "Lucene");
        addDoc(w, "Lucéne");
        addDoc(w, "lucéne");
        addDoc(w, "Lucane");
        addDoc(w, "Lucen");

        w.close();

        FuzzyQuery q =  new FuzzyQuery( new Term("title", "lucéne") , 0.5f
);
        
        // 3. search
        IndexSearcher searcher = new IndexSearcher(index);
        
        TopDocs collector = searcher.search(q, 10);
        ScoreDoc[] hits = collector.scoreDocs;

        // 4. display results
        System.out.println("Found " + hits.length + " hits.");
        for(int i = 0 ; i < hits.length; i++)
        {
              Document d = searcher.doc(hits[i].doc);
              System.out.println((i + 1) + ". " + d.get("title") + " " + 
hits[i].score );
        }

        // searcher can only be closed when there
        // is no need to access the documents any more.
        searcher.close();
    }


    private static void addDoc(IndexWriter w, String value) throws
IOException
    {
        Document doc = new Document();
        doc.add(new Field("title", value, Field.Store.YES,
Field.Index.ANALYZED));
        w.addDocument(doc);
    }
-- 
View this message in context: http://old.nabble.com/Strange-Fuzzyquery-results-scoring-when-using-a-low-minimal-distance-tp27594371p27594371.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message