lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@syr.edu>
Subject Re: Lazy Field Loading
Date Tue, 04 Apr 2006 20:40:00 GMT
Your right, more flexibility is needed, but it goes beyond just field 
loading in my mind.  I think this is what Doug was getting at (at least 
partially) with http://wiki.apache.org/jakarta-lucene/Lucene2Whiteboard 
#12  although that focuses on Indexing, I think it should be considered 
for searching.  I am not sure we should just continue adding more and 
more methods onto IndexReader.  I think the 2.x move gives us an 
opportunity to refactor some of the things we think we can make better.

I am not sure you need 509 when you have Lazy loading.  In my mind, you 
have the best of both worlds.  You can get all the meta-info about all 
the stored fields on the Document w/o the penalty of loading the actual 
data.
 
My use case is below (my guess is this is quite common). 

Run a search, get back your hits and display summary information on the 
hits (i.e. the "small" fields).  User picks the Hit they want to see 
more info on, go display the full document, including, most likely, the 
info in the really large stored fields (i.e the original document).  To 
date, I have been storing this info elsewhere b/c of the loading 
penalty.  With lazy loading, I don't need to do this.  I can just defer 
loading until the second level access is needed and I never load it if 
the user doesn't ask for it. 

In the case where you only get a few smaller fields, you have to go back 
and get the document again when you want to display the contents of the 
large field.

Of course, there are several other use cases where you may only want 
certain fields, but I don't think there is much cost associated with 
loading small fields, just the large ones, so you can just make them lazy.


Yonik Seeley wrote:
> On 3/31/06, Yonik Seeley <yseeley@gmail.com> wrote:
>   
>>>         <https://issues.apache.org:443/jira/browse/LUCENE-509>
>>>       
>> Yes, I'd personally find a way to retrieve just fields x,y, and z more
>> useful than lazy loading.
>>     
>
> Thinking a little more, it would be nice if the field reading API was
> opened up a little more so that multiple things could be done... even
> construct different field/document objects (say a document
> implementation that indexed the fields, etc).
> That could be used to implement either lazy field loading, or loading
> of specific fields.
>
> The lazy loading alone doesn't really address LUCENE-509
>
> I was thinking something along the lines of
>
> // an IndexReader would call FieldReader methods for each
> abstract class FieldReader {
>   boolean readField(int fieldnum, String fieldName);  // users return
> true if this field should be read.
>   boolean stringField(int fieldnum, byte[] utf8);   // returns true to
> keep reading next field
>     OR
>   boolean stringField(int fieldnum, String str);   // returns true to
> keep reading next field
>   boolean binaryField(int fieldnum, byte[] data);  // returns true to
> keep reading next field
> }
>
> class IndexReader {
>   // expert level API
>   void readFields(int doc, FieldReader reader);
> }
>
> Just brainstorming so far...
>
> -Yonik
> http://incubator.apache.org/solr Solr, The Open Source Lucene Search Server
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>   

-- 

Grant Ingersoll 
Sr. Software Engineer 
Center for Natural Language Processing 
Syracuse University 
School of Information Studies 
335 Hinds Hall 
Syracuse, NY 13244 

http://www.cnlp.org 
Voice:  315-443-5484 
Fax: 315-443-6886 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message