lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <>
Subject RE: How to extract Document object after the search?
Date Mon, 02 Feb 2009 12:46:30 GMT

you should generally not download all fields for all documents in the
HitCollector Loop, if you really need it (because you want to do some
analysis on the whole result set after search), you should do the following:

- only retrieve those document fields, you really need (using a
FieldSelector like SetBasedFieldSelector).
- Do some buffering in the HitCollector: Allocate an array of int for the
collected doc ids with a size of say 16,000. For each collect() call, add
the document id to the array. When the array is full and at the end of
collecting, call a flush method: This method sorts the array by ID (because
if the Ids are in increasing order less seeking is needed) and then calls
document(id) for each entry in a bulk. This is faster. In older versions of
Lucene array sorting may not needed, but you really should do it (the newer
search API may not return documents in doc order).


Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen

> -----Original Message-----
> From: mittals []
> Sent: Monday, February 02, 2009 12:54 PM
> To:
> Subject: How to extract Document object after the search?
> As per Lucene documentation -
> "For good search performance, implementations of this method should not
> call
> Searcher.doc(int) or IndexReader.document(int) on every document number
> encountered. Doing so can slow searches by an order of magnitude or more."
> My question is - what's the other way to get the Document object to avoid
> performance bottleneck?
> --
> View this message in context:
> Document-object-after-the-search--tp21788361p21788361.html
> Sent from the Lucene - Java Users mailing list archive at
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message