lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chuck Williams (JIRA)" <>
Subject [jira] Commented: (LUCENE-558) Selective field loading
Date Sat, 29 Apr 2006 00:26:38 GMT
    [ ] 

Chuck Williams commented on LUCENE-558:

Grant, I think that's a great idea.  I'll look at adding the extension to support the reader
optimization for ParallelReader.  Essentially it will provide a means for FieldSelectors to
declare the complete list of fields that will ever be loaded as an optional operation.  If
this is declared, then ParallelReader (for example) can access only relevant readers.

There is another extension I'll add unless there are objections.  The idea is to extend the
lazy loading to support streaming through readerValue() and a new streamValue() (the former
for uncompressed String fields and the latter for compressed String and binary fields).  This
will support getting a reader or stream to obtain the field value rather than reading it all
into a String or byte[].  This seems like a huge advantage in many applications (e.g., my
current one).

It would be an upward incompatibility to support readerValue() this way (since it would no
longer be true that exactly one of stringValue(), binaryValue() and readerValue() is non-null).
 So it could be a different method, or limited to a new Fieldable subtype.

The reader returned will be full function -- e.g., it is easy to support arbitrary mark()
and reset().

> Selective field loading
> -----------------------
>          Key: LUCENE-558
>          URL:
>      Project: Lucene - Java
>         Type: New Feature

>   Components: Index
>     Versions: 2.0
>  Environment: All
>     Reporter: Chuck Williams
>  Attachments: LuceneTrunk.patch
> Provides a new api, IndexReader.document(int doc, String[] fields).  A document containing
only the specified fields is created.  The other fields of the document are not loaded, although
unfortunately uncompressed strings still have to be scanned because the length information
in the index is for UTF-8 encoded chars and not bytes.  This is useful for applications that
need quick access to a small subset of the fields.  It can be used in conjunction with or
for some uses instead of ParallelReader.
> This is a much smaller change for a simpler use case than Lucene-545.  No existing API's
are affected.
> All the tests pass and new tests are added to verify the feature.

This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators:
For more information on JIRA, see:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message