lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marcel Stör <mar...@frightanic.com>
Subject Too few search results
Date Wed, 05 Feb 2003 15:01:41 GMT
Hi all,

My Lucene IndexSearcher returns too few hits when I use some extended
query syntaxt. I'll give examples of my query/hits pairs at the bottom.
I'm indexing a database table:

<CODE>
//creating a full index
DBConnection conn = null;
try {
    Analyzer analyzer = new StandardAnalyzer();
    IndexWriter writer = new IndexWriter(indexFile, analyzer, true);
    conn =
DBConnectionFactory.getDBConnetion(DBConnectionFactory.ORACLE);
    String query = "select id, category, problem, solution from
kedb.solution";
    conn.open(cred);
    ResultSet rs = conn.execute(query);
    while (rs.next()) {
        Document doc = new Document();
        doc.add(Field.Text("id", rs.getString(1)));
        doc.add(Field.Text("category", rs.getString(2)));
        doc.add(Field.Text("problem", rs.getString(3)));
        doc.add(Field.UnStored("solution", rs.getString(4)));
        writer.addDocument(doc);
    }
    writer.close();
    conn.close();
} catch (Exception ex) {
    ex.printStackTrace();
} finally {
    conn.close();
}

//searching the index
Analyzer analyzer = new StandardAnalyzer();
Searcher searcher = new IndexSearcher(IndexReader.open(indexFile));
Query q = QueryParser.parse(query, "problem", analyzer);
Hits  hits = searcher.search(q);
for (int i = 0; i < hits.length(); i++) {
    if (0 == cat.compareTo("") || 0 ==
hits.doc(i).get("category").compareTo(cat)) {
        ids.add(hits.doc(i).get("id"));
        cats.add(hits.doc(i).get("category"));
        probs.add(hits.doc(i).get("problem"));
    }
}
</CODE>

This is sample data from the table. Right, it's German and not English!
I tried GermanAnalyzer instead of StandardAnalyzer but that made the
search results even worse.

ID;CATEGORY;PROBLEM;SOLUTION;
1;2;Irgendetwas mit dem Novell Server stimmt nicht;Server neu booten;
2;15;User kann sich nicht mehr am Novell anmelden. Kabel ist
eingesteckt. Neubooten nützt nichts. Baum wird nicht gefunden.;Baum
manuell suchen über Button auf Login-Maske.;
3;9;Makros in Word funktionieren nicht;Optionen geändert (Medium Level).
Extras - Optionen - Makros;
4;5;Scanner funktioniert nicht. Gerät erscheint nicht in der Liste der
USB Geräte.;Neusten Treiber installieren.
VORSICHT: NT 4.0 unterstützt kein USB!!!;
5;16;Maus spinnt;Reinigen/Auswechseln;
6;16;Eheprobleme;Adresse einer Beratungsstelle vermitteln;
7;11;Browser bringt imme eine Sexseite als Startseite;Extras - Optionen
- Startseite festlegen;


  Query             /hits       /expected
1 Novell            /2          /2
2 Nove*             /0          /2
3 Novel~            /0          /2
4 N?vell            /0          /2
5 +Scanner +Baum    /0          /0
6 Scanner -USB      /0          /0
7 usb AND liste     /1          /1
8 Mikros~           /1          /1
9 Ehe*              /1          /0
10 "Word Optionen"~10/1          /0
11 "Browser Sexseite"~10/1       /1


Boolean operators never seem to be a problem (ok, it's the easiest to
implement ;-)). Fuzzy searches, wildcard searches, plus proximity
searches just don't return enough hits. Very surprising are number 10 &
11. At 10 the proximity fails but at 11 everything is fine!

If you have any tips as for how to improve the search process please let
me know.

Regards,
Marcel


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message