lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Elschot <paul.elsc...@xs4all.nl>
Subject Re: Improving Index Search Performance
Date Tue, 25 Mar 2008 19:15:28 GMT
Shailendra,

Have a look at the javadocs of HitCollector:
http://lucene.apache.org/java/2_3_0/api/core/org/apache/lucene/search/HitCollector.html

The problem is with the use of the disk head, when retrieving
the documents during collecting, the disk head has to move
between the inverted index and the stored documents; see also
the file formats.

To avoid such excessive disk head movement, you need to collect
all (or at least many more than 1 of) your document ids during
collect(), for example into an int[].
After collecting retrieve the all the docs with Searcher.doc().

Also, for the same reason, retrieving docs is best done in doc id
order, but that is unlikely to go wrong as doc ids are normally
collected in increasing order.

Regards,
Paul Elschot


Op Tuesday 25 March 2008 13:43:18 schreef Shailendra Mudgal:
> Hi Everyone,
>
> We are using Lucene to search on a index of around 20G size with
> around 3 million documents. We are facing performance issues loading
> large results from the index. Based on the various posts on the forum
> and documentation, we have made the following code changes to improve
> the performance:
>
> i. Modified the code to use HitCollector instead of Hits since we
> will be loading all the documents in the index based on keyword
> matching ii. Added MapFieldSelector to load only selected fields(2
> fields only) instead of all the 14
>
> After all these changes, it seems to be  taking around 90 secs to
> load 17k documents. After profiling, we found that the max time is
> spent in * searcher.doc(id,selector).
>
> *Here is the code:
>
> *                public void collect(int id, float score) {
>                     try {
>                         MapFieldSelector selector = new
> MapFieldSelector(new String[] {COMPANY_ID, ID});
>                         doc = searcher.doc(id, selector);
>                         mappedCompanies = doc.getValues(COMPANY_ID);
>                     } catch (IOException e) {
>                         logger.debug("inside IDCollector.collect()
>
> :"+e.getMessage());
>
>                     }
>                 }*
>
> *
> *We also read in one of the posts that we should use bitSet.set(doc)
> instead of calling searcher.doc(id). But we are unable to to
> understand how this might help in our case since we will anyway have
> to load the document to get the other required field(company_id).
> Also we observed that the searcher is actually using only 1G RAM
> though we have 4G allocated to it.
>
> Can someone suggest if there is any other optimization that can done
> to improve the search performance on MultiSearcher. Any help would be
> appreciated.
>
> Thanks,
> Vipin



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message