lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From sim luc <sim....@gmx.net>
Subject Re: Too few search results
Date Thu, 06 Feb 2003 16:32:59 GMT
hi,
i just read an interesting bit of documentation in the api under
IndexSearcher.search(Query, Filter, HitCollector). it says:
"Applications should only use this if they need all of the matching
documents. The high-level search API (Searcher.search(Query))
is usually more efficient, as it skips non-high-scoring hits."
i don't know if the hits you are missing are "non-high-scoring",
but it might be worth checking.

> Hi all,
> 
> My Lucene IndexSearcher returns too few hits when I use some extended
> query syntaxt. I'll give examples of my query/hits pairs at the bottom.
> I'm indexing a database table:
> 
> <CODE>
> //creating a full index
> DBConnection conn = null;
> try {
>     Analyzer analyzer = new StandardAnalyzer();
>     IndexWriter writer = new IndexWriter(indexFile, analyzer, true);
>     conn =
> DBConnectionFactory.getDBConnetion(DBConnectionFactory.ORACLE);
>     String query = "select id, category, problem, solution from
> kedb.solution";
>     conn.open(cred);
>     ResultSet rs = conn.execute(query);
>     while (rs.next()) {
>         Document doc = new Document();
>         doc.add(Field.Text("id", rs.getString(1)));
>         doc.add(Field.Text("category", rs.getString(2)));
>         doc.add(Field.Text("problem", rs.getString(3)));
>         doc.add(Field.UnStored("solution", rs.getString(4)));
>         writer.addDocument(doc);
>     }
>     writer.close();
>     conn.close();
> } catch (Exception ex) {
>     ex.printStackTrace();
> } finally {
>     conn.close();
> }
> 
> //searching the index
> Analyzer analyzer = new StandardAnalyzer();
> Searcher searcher = new IndexSearcher(IndexReader.open(indexFile));
> Query q = QueryParser.parse(query, "problem", analyzer);
> Hits  hits = searcher.search(q);
> for (int i = 0; i < hits.length(); i++) {
>     if (0 == cat.compareTo("") || 0 ==
> hits.doc(i).get("category").compareTo(cat)) {
>         ids.add(hits.doc(i).get("id"));
>         cats.add(hits.doc(i).get("category"));
>         probs.add(hits.doc(i).get("problem"));
>     }
> }
> </CODE>
> 
> This is sample data from the table. Right, it's German and not English!
> I tried GermanAnalyzer instead of StandardAnalyzer but that made the
> search results even worse.
> 
> ID;CATEGORY;PROBLEM;SOLUTION;
> 1;2;Irgendetwas mit dem Novell Server stimmt nicht;Server neu booten;
> 2;15;User kann sich nicht mehr am Novell anmelden. Kabel ist
> eingesteckt. Neubooten nützt nichts. Baum wird nicht gefunden.;Baum
> manuell suchen über Button auf Login-Maske.;
> 3;9;Makros in Word funktionieren nicht;Optionen geändert (Medium 
> Level).
> Extras - Optionen - Makros;
> 4;5;Scanner funktioniert nicht. Gerät erscheint nicht in der Liste der
> USB Geräte.;Neusten Treiber installieren.
> VORSICHT: NT 4.0 unterstützt kein USB!!!;
> 5;16;Maus spinnt;Reinigen/Auswechseln;
> 6;16;Eheprobleme;Adresse einer Beratungsstelle vermitteln;
> 7;11;Browser bringt imme eine Sexseite als Startseite;Extras - Optionen
> - Startseite festlegen;
> 
> 
>   Query             /hits       /expected
> 1 Novell            /2          /2
> 2 Nove*             /0          /2
> 3 Novel~            /0          /2
> 4 N?vell            /0          /2
> 5 +Scanner +Baum    /0          /0
> 6 Scanner -USB      /0          /0
> 7 usb AND liste     /1          /1
> 8 Mikros~           /1          /1
> 9 Ehe*              /1          /0
> 10 "Word Optionen"~10/1          /0
> 11 "Browser Sexseite"~10/1       /1
> 
> 
> Boolean operators never seem to be a problem (ok, it's the easiest to
> implement ;-)). Fuzzy searches, wildcard searches, plus proximity
> searches just don't return enough hits. Very surprising are number 10 &
> 11. At 10 the proximity fails but at 11 everything is fine!
> 
> If you have any tips as for how to improve the search process please let
> me know.
> 
> Regards,
> Marcel
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 

-- 
+++ GMX - Mail, Messaging & more  http://www.gmx.net +++
NEU: Mit GMX ins Internet. Rund um die Uhr für 1 ct/ Min. surfen!


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message