lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From mittals <sourabh-931.mit...@morganstanley.com>
Subject RE: How to extract Document object after the search?
Date Tue, 03 Feb 2009 07:03:31 GMT

Hi,

I have not seen much time difference between when I load the single field &
all the fields of a document.

After search, lucene cache the documents into the memory. Is there any way
to configure the no. of documents to be cached into the memory?

what could be the benefit in using FieldSelectorResult.LOAD &
FieldSelectorResult.LAZY_LOAD?

Regards,
Sourabh


Uwe Schindler wrote:
> 
> Hi,
> 
> you should generally not download all fields for all documents in the
> HitCollector Loop, if you really need it (because you want to do some
> analysis on the whole result set after search), you should do the
> following:
> 
> - only retrieve those document fields, you really need (using a
> FieldSelector like SetBasedFieldSelector).
> - Do some buffering in the HitCollector: Allocate an array of int for the
> collected doc ids with a size of say 16,000. For each collect() call, add
> the document id to the array. When the array is full and at the end of
> collecting, call a flush method: This method sorts the array by ID
> (because
> if the Ids are in increasing order less seeking is needed) and then calls
> document(id) for each entry in a bulk. This is faster. In older versions
> of
> Lucene array sorting may not needed, but you really should do it (the
> newer
> search API may not return documents in doc order).
> 
> Uwe
> 
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
> 
> 
>> -----Original Message-----
>> From: mittals [mailto:sourabh-931.mittal@morganstanley.com]
>> Sent: Monday, February 02, 2009 12:54 PM
>> To: java-user@lucene.apache.org
>> Subject: How to extract Document object after the search?
>> 
>> 
>> As per Lucene documentation -
>> "For good search performance, implementations of this method should not
>> call
>> Searcher.doc(int) or IndexReader.document(int) on every document number
>> encountered. Doing so can slow searches by an order of magnitude or
>> more."
>> 
>> My question is - what's the other way to get the Document object to avoid
>> performance bottleneck?
>> 
>> --
>> View this message in context: http://www.nabble.com/How-to-extract-
>> Document-object-after-the-search--tp21788361p21788361.html
>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/How-to-extract-Document-object-after-the-search--tp21788361p21804802.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message