lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson" <erickerick...@gmail.com>
Subject Re: Catching BooleanQuery.TooManyClauses
Date Sat, 15 Apr 2006 11:44:20 GMT
With the warning that I'm not the most experienced Lucene user in the
world...

I *think*, that rather than search for each term, it's more efficient to
just use IndexReader.termDocs..... i.e.

Indexreader ir = <whatever>;
TermDocs termDocs = ir.TermDocs();
WildcardTermEnum wildEnum = <whatever>;

for (Term term = null; (term = wildEnum.term()) != null; wildEnum.next()) {
      termDocs.seek(term);
      while (termDocs.next()) {
            Document doc = reader.document(termDocs.doc())
      }
}

I know that for loop looks odd, but I just peeked at the source code for the
TermEnum classes and see why it works.

One warning, as the folks on the board have pointed out to me is that the
Hits object is not entirely efficient when you fetch lots of docs (more than
100 has been mentioned) and you should think about TopDocs or some such.

Also, if you can avoid fetching the document (i.e. get everything you want
from the index) you'll add efficiency. I have no clue how much you're
returning to the user, so I don't know whether that would work for you.....

Hope this helps
Erick

P.S. I feel kind of odd writing things like this given that Chris, Yonik,
Erik & etc. are looking over my shoulder, but if I actually offer good
advice, maybe I can save them some time since they've certainly helped me
out. And if they make alternate suggestions, they'll be doing code reviews
for me! Cool!!!!! <G>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message