lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kelvin Tan <>
Subject Document lazy-loading WAS [Re: Fast access to a random page of the search results.]
Date Tue, 08 Mar 2005 11:38:36 GMT

On Tue, 8 Mar 2005 09:56:37 +0000 (GMT), mark harwood wrote:
>> But I suppose for Document
>> has to be further subclassed so that the other
>> non-initialized fields can be obtained as well, or
> I don't think Document would be the right place for
> this - as a design pattern it is cast as a "value
> object" or "transfer object" which is passed to
> (potentially remote) clients. It shouldn't be holding
> connections to an index so it can load extra fields on
> demand.

Good point.

> the offsets are not stored and you have to
> serially read through all of the document's fields on
> disk until you reach the one you want.

How about if the structure of the Document is known (as is usually the case), and a contract
be made that the first, say 3 fields, are always lazy-loaded. That way, the iteration through
all fields can be avoided.

> There's a price to pay for allowing clients too much
> freedom and I think lazy loading of field values is an
> example of something which is too costly.
> I personally prefer a search interface which requires
> clients to state up front what fields they want
> returned - the equivalent of SQL's "Select" clause in
> addition to the usual "where.." matching criteria.

Here's my requirement, and you may have a better idea of approaching it: I need to perform
a simple "Top 10 most frequent occurring <field>" from a search. Right now, the naive
implementation simply goes through all the hits, gets the respective field(s) and performs
the counting. The requirement is not so different from sorting. Sorting loads and caches all
the values for that term in the index, but that approach is probably not optimal in this case
because there are multiple fields, and they are variable length.

Ideally, I'd like to load only the fields I need from the Documents (for counting purposes).
When the user actually views that hit, then I can load the rest of the fields for display.

Any ideas?

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message