lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yonik Seeley" <ysee...@gmail.com>
Subject Re: Lazy Field Loading
Date Tue, 04 Apr 2006 21:20:07 GMT
On 4/4/06, Grant Ingersoll <gsingers@syr.edu> wrote:
> I am not sure you need 509 when you have Lazy loading.

It would be nice to avoid creating a Field object at all... we have
some crazy documents with more than 1000 fields :-)  I think the Field
object itself takes up more room than the data.

For my usecases, specifying which fields should be lazily loaded
doesn't work well...  I know which fields I want, not which ones I
don't.

> My use case is below (my guess is this is quite common).
>
> Run a search, get back your hits and display summary information on the
> hits (i.e. the "small" fields).  User picks the Hit they want to see
> more info on, go display the full document

It seems like the only way this can work is if you keep the index
searcher open and cache the Hits object that the user used.  How long
do you keep that searcher open waiting for the user to do something? 
I guess it could work as long as you have logic to re-execute the
query if the searcher changes...

> , including, most likely, the
> info in the really large stored fields (i.e the original document).  To
> date, I have been storing this info elsewhere b/c of the loading
> penalty.  With lazy loading, I don't need to do this.  I can just defer
> loading until the second level access is needed and I never load it if
> the user doesn't ask for it.

Actually, for really large text fields, I can see that you wouldn't
want lucene to re-parse the fields anyway, so I agree that lazy
loading helps there.

> In the case where you only get a few smaller fields, you have to go back
> and get the document again when you want to display the contents of the
> large field.
>
> Of course, there are several other use cases where you may only want
> certain fields, but I don't think there is much cost associated with
> loading small fields, just the large ones, so you can just make them lazy.

Part of the cost is iterating through all the fields of the Document
looking for the one or two you want.

-Yonik


> Yonik Seeley wrote:
> > On 3/31/06, Yonik Seeley <yseeley@gmail.com> wrote:
> >
> >>>         <https://issues.apache.org:443/jira/browse/LUCENE-509>
> >>>
> >> Yes, I'd personally find a way to retrieve just fields x,y, and z more
> >> useful than lazy loading.
> >>
> >
> > Thinking a little more, it would be nice if the field reading API was
> > opened up a little more so that multiple things could be done... even
> > construct different field/document objects (say a document
> > implementation that indexed the fields, etc).
> > That could be used to implement either lazy field loading, or loading
> > of specific fields.
> >
> > The lazy loading alone doesn't really address LUCENE-509
> >
> > I was thinking something along the lines of
> >
> > // an IndexReader would call FieldReader methods for each
> > abstract class FieldReader {
> >   boolean readField(int fieldnum, String fieldName);  // users return
> > true if this field should be read.
> >   boolean stringField(int fieldnum, byte[] utf8);   // returns true to
> > keep reading next field
> >     OR
> >   boolean stringField(int fieldnum, String str);   // returns true to
> > keep reading next field
> >   boolean binaryField(int fieldnum, byte[] data);  // returns true to
> > keep reading next field
> > }
> >
> > class IndexReader {
> >   // expert level API
> >   void readFields(int doc, FieldReader reader);
> > }
> >
> > Just brainstorming so far...
> >
> > -Yonik
> > http://incubator.apache.org/solr Solr, The Open Source Lucene Search Server
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-dev-help@lucene.apache.org
> >
> >
>
> --
>
> Grant Ingersoll
> Sr. Software Engineer
> Center for Natural Language Processing
> Syracuse University
> School of Information Studies
> 335 Hinds Hall
> Syracuse, NY 13244
>
> http://www.cnlp.org
> Voice:  315-443-5484
> Fax: 315-443-6886
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>


--
-Yonik
http://incubator.apache.org/solr Solr, The Open Source Lucene Search Server

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message