lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eloi Rocha Neto" <eloi.ro...@gmail.com>
Subject Help with Fuzzy Queries
Date Thu, 06 Mar 2008 18:19:05 GMT
Hi,

  I am new with Lucene.

  I dont understand how Lucene works in some cases. For example:

  If I have an index with the following three entries:
   - ATUAÇÃO FALHA DE DISJUNTOR
   - RESET DE FALHA DE DISJUNTOR
   - FALHA DE COMANDO

  When I try to look for something limilar with "FALHA DE DISJUNTOR", I've
got the following results:
     Result | score
     FALHA DE COMANDO | 0.9277342
     ATUAÇÃO FALHA DE DISJUNTOR | 0.8880876
     RESET DE FALHA DE DISJUNTOR | 0.5709133

   If you pay attention, "FALHA DE COMANDO" is comming before "ATUAÇÃO FALHA
DE DISJUNTOR", but it is not what I would like. For my client, the first
result should be "ATUAÇÃO FALHA DE DISJUNTOR".

   Other example happens when I try to look for "FALHA DISJUNTOR". For my
surprise, the unique result is "FALHA DE COMANDO". Like the other example, I
was expecting "ATUAÇÃO FALHA DE DISJUNTOR" as the first or unique result.

   What should I do in order to have the "ATUAÇÃO FALHA DE DISJUNTOR" as the
first result (or unique) in the explained cases.

   I am attaching the code at the end of this mail.

Thanks in advance,

Eloi


public class TestLucene {

    private static final String FILENAME = "test.idx";
    private static final String FIELD = "FIELD";

    private static List<Document> getDocs() {
        List<Document> docs = new ArrayList<Document>();
        docs.add(toDoc("Atuação Falha de Disjuntor"));
        docs.add(toDoc("Reset de Falha de Disjuntor"));
        docs.add(toDoc("Falha de Comando"));
        return docs;
    }

    private static Document toDoc(String text) {
        Document doc = new Document();
        doc.add(new Field(FIELD, text.toUpperCase(), Field.Store.YES,
                Field.Index.UN_TOKENIZED));
        return doc;
    }

    private static void createIndex(Collection docs) throws IOException {
        IndexWriter writer = new IndexWriter(FILENAME, new
StandardAnalyzer(),
                true);
        Iterator itDocs = docs.iterator();
        while (itDocs.hasNext()) {
            Document doc = (Document) itDocs.next();
            writer.addDocument(doc);
        }
        writer.optimize();
        writer.close();
    }

    public static void main(String[] args) throws IOException {
        createIndex(getDocs());
        IndexSearcher indexSearcher = new IndexSearcher(FILENAME);
        Hits hits = indexSearcher.search(new FuzzyQuery(new
Term(FIELD,"Falha de Disjuntor".toUpperCase()), 0.4f));
        System.out.println(hits.length());
        Iterator it = hits.iterator();
        while (it.hasNext()) {
            Hit hit = (Hit) it.next();
            System.out.println(hit.get(FIELD) + " " + hit.getScore());
        }
    }

}

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message