lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From mark harwood <>
Subject Re: Lazy Field Loading
Date Fri, 31 Mar 2006 14:31:58 GMT
> I don't think option 3 is baked in at indexing time.

Sorry, I misread it. Yes, that is another option.

So if options 3 and 4 are about search-time selection
(based on size and fieldname respectively) can they be
generalized into a more wide-reaching retrieval API?

You can imagine a high-level retrieval language like

  Select url, length(contents), substring(descr,0,50)

..where we have 3 items being returned. The first item
(url) is a straight copy of the original field data,
the second is the size in bytes of the "contents"
field and the third is a summary of the "descr" field
(in this case a simple substring but conceivably could
be a more sophisticated summarizer eg the highlighter)

If you think of each of these as retrieval functions
we have an API that looks something like this:

 IndexReader.document(int doc, 
        RetrieveFunction []retrievers);

interface RetreiveFunction {
  Object getValue(FieldMetaData f);

interface FieldMetaData
   String getFieldName()
   int getSize();
   InputStream getInputStream();

The reader calls the retrievers with a FieldMetaData
object for each field and the data is only loaded from
disk if a retrievefunction "bites" and asks for the
InputStream to get the content for a field.
You can imagine the different RetrieveFunction
implementations could then not only choose which
fields are returned but also how much content and in
what format.

I'm not sure if there is a sufficently long list of
different retriever functions that would make this a
useful approach.


Send instant messages to your online friends 

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message