lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "J.J. Larrea" <...@panix.com>
Subject Re: What is a Hits object?
Date Wed, 05 Oct 2005 21:31:48 GMT
A Hits object is essentially a cache on query results.  It caches in 2 ways:

1. When a query returning Hits is requested, only the top 100 document IDs and scores are
requested from the scoring system, and the ID/Score pairs are stored in a list in the Hits
object.  Whenever a document ID, score, or Document object are requested that lie beyond the
end of that list, the query is reexecuted in order to grow the list to at or beyond the request,
typically 100% beyond it.

2. Returning a Document object (rather than a score or document ID) requires reconstituting
the Document from the stored fields in the index, which is an expensive operation.  The Hits
object maintains a cache of the 200 most recently requested Document objects, so it is unlikely
they will need to be reconstituted more than once.

This is all optimized around typical hitlist access patterns - navigate forward and backwards
through the results pages a small number of documents at a time.  For applications which cannot
benefit from the Hits caching, for example which employ their own hit caching layer, one can
effectively use the so-called "low level" IndexSearcher routines which return TopDocs rather
than Hits.

- J.J.

At 8:26 PM +0100 10/5/05, Cyril Barlow wrote:
>Is it an actual array of full Documents or a list of reference points to
>Documents? And what's the typical size in memory of a Hits object with say
>1000 avg size docs?

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message