lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Donovan Aaron" <donovan_aa...@bah.com>
Subject RE: Lazy Field Loading
Date Wed, 29 Mar 2006 14:13:54 GMT
I've done a lot of work with Verity's search engine, and I like the way
they handle fields.  At query time you specify the fields you want
returned from matching documents.

Aaron

-----Original Message-----
From: Grant Ingersoll [mailto:gsingers@syr.edu] 
Sent: Wednesday, March 29, 2006 9:05 AM
To: java-dev@lucene.apache.org
Subject: Re: Lazy Field Loading

Hmmm, I guess I always thought of it as a property of the field that
user's would want to explicitly control.  I assumed that most fields
would not be lazy and a few would be.
Now that you have backed me up a bit on it (in a good way), I think it
could just as easily be a parameter that any field that is over a
specified size would be lazily loaded.  With this approach, I could see:

IndexReader.document(int docNumber, long maxFieldSizeToLoad);

and IndexReader.document(int docNum) would just call this new method
passing in some default value, say 2K or something.

Or, we could pass in an array of field names to be lazily loaded to,
something like

IndexReader.document(int docNumber, String [] fieldNamesToLoadLazy);

The current way I have it looks something like (with a few other
variations):
public Field(String name, String value, Store store, Index index,
LazyLoad lazy) and public Field(String name, byte[] value, Store store,
LazyLoad lazy)

for field constructors.

I am happy to do either way since the underlying mechanics are pretty
similar.  What do others think?

-Grant

Erik Hatcher wrote:
> Lazy loaded fields will be a nice addition to Lucene.   I'm curious 
> why the flag is set at indexing time rather than it being something 
> that is controlled during retrieval somehow.  I'm not sure what that 
> API would look like, but it seems its a decision to be addressed 
> during searching and reading of an index rather than during indexing 
> itself.
>
>     Erik
>
>
> On Mar 29, 2006, at 8:31 AM, Grant Ingersoll wrote:
>
>> I have a base implementation of lazy field loading that I am starting

>> to test and wanted to run my approach by everyone to hear their 
>> thoughts.
>>
>> I have, as per Doug's suggestion from a while ago, created an 
>> interface named Fieldable that is implemented by Field and a new, 
>> private class, owned by FieldsReader.  I have introduced an 
>> "enumerated" type to the Field class named LazyLoad (which can be YES

>> or NO, in the same spirit as Field.TermVector).  Any place that used 
>> to take Field now takes Fieldable.  This should be completely 
>> transparent and backward-compatible.  The existing constructors of 
>> field all assume lazy to be off.
>>
>> On creation of a Field, a user can pass in LazyLoad.YES or NO to a 
>> constructor that takes either a String value or a byte array (it does

>> not apply to the Reader constructors since they do not store their 
>> content).  Indexing and writing of fields take place as normal, the 
>> only difference being there is an extra bit added to the field 
>> writing that marks the field as being lazy.
>>
>> On reading in of the field, if it is Lazy, instead of reading in the 
>> value for the field and constructing a Field, construct a LazyField 
>> instance which takes in the pointer of the fieldsStream and the 
>> amount of data to read.  This instance, since it is a private class 
>> of FieldsReader, maintains access to the fieldsStream.  Thus, when a 
>> application goes to access the value of the field, we check to see if

>> it is has been loaded or not.  If it has not, we load it using the 
>> fieldsStream, the pointer and the length to read.
>>
>> Does anyone see any issues with this?  I think it will only really 
>> pay off on large stored fields, but have not quantified it yet.  My 
>> main concern is the semantics of the fieldsStream and whether that 
>> would be closed behind the back of the LazyField implementation.  My 
>> understanding is that as long as the IndexReader is open, this stream
>> should also be open.  Is that correct?   What am I forgetting about?
>>
>> If testing goes well, I should be able to button this up this week or

>> next and submit the patch.
>>
>> --
>> Grant Ingersoll Sr. Software Engineer Center for Natural Language 
>> Processing Syracuse University School of Information Studies 335 
>> Hinds Hall Syracuse, NY 13244 http://www.cnlp.org Voice:  
>> 315-443-5484 Fax: 315-443-6886
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

-- 

Grant Ingersoll
Sr. Software Engineer
Center for Natural Language Processing
Syracuse University
School of Information Studies
335 Hinds Hall
Syracuse, NY 13244 

http://www.cnlp.org
Voice:  315-443-5484
Fax: 315-443-6886 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message