lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chuck Williams (JIRA)" <j...@apache.org>
Subject [jira] Updated: (LUCENE-545) Field Selection and Lazy Field Loading
Date Wed, 03 May 2006 09:30:46 GMT
     [ http://issues.apache.org/jira/browse/LUCENE-545?page=all ]

Chuck Williams updated LUCENE-545:
----------------------------------

    Attachment: LazyFields.tar.gz

Continuing the discussion from Lucene-558, LazyFields.tar.gz extends this patch (Lucene-545)
with an additional optimization so that ParallelReader does not read fields from readers all
of whose fields are NO_LOAD.  No change to the FieldSelector interface was required to achieve
this.  Also, a useful new FieldSelector is provided, MapFieldSelector, and TestParallelReader
is extended to test these things.

Bug fixes to ParallelReader from Lucene-561 are also included.

Keeping everything involved factored and managing this with my other local changes has led
to a slightly more complex file structure.  The steps to use LazyFields.tar.gz are:

Unpack it
Apply fieldSelectorPatch.txt
Apply ParallelReader.patch
Apply TestParallelReader.patch
Unpack and copy fieldSelectorNewFiles.tar.gz
Copy LazyFields.new

The target of all patch applications and copies is the Lucene trunk.

When I applied fieldSelectorPatch.txt against the latest Lucene trunk, a couple hunks failed
to apply, but they were not relevant.  The version included here is the original version unchanged.


> Field Selection and Lazy Field Loading
> --------------------------------------
>
>          Key: LUCENE-545
>          URL: http://issues.apache.org/jira/browse/LUCENE-545
>      Project: Lucene - Java
>         Type: New Feature

>   Components: Store
>     Versions: 2.0
>     Reporter: Grant Ingersoll
>     Priority: Minor
>  Attachments: LazyFields.tar.gz, fieldSelectorPatch.txt, newFiles.tar.gz
>
> The patch to come shortly implements a Field Selection and Lazy Loading mechanism for
Document loading on the IndexReader.
> It introduces a FieldSelector interface that defines the accept method:
> FieldSelectorResult accept(String fieldName);
> (Perhaps we want to expand this to take in other parameters such as the field metadata
(term vector, etc.))
> Anyone can implement a FieldSelector to define how they want to load fields for a Document.
 
> The FieldSelectorResult can be one of four values: LOAD, LAZY_LOAD, NO_LOAD, LOAD_AND_BREAK.
 
> The FieldsReader, as it is looping over the FieldsInfo, applies the FieldSelector to
determine what should be done with the current field.
> I modeled this after the java.io.FileFilter mechanism.  There are two implementations
to date: SetBasedFieldSelector and LoadFirstFieldSelector.  The former takes in two sets of
field names, one to load immed. and one to load lazily.  The latter returns LOAD_AND_BREAK
on the first field encountered.  See TestFieldsReader for examples.
> It should support UTF-8 (I borrowed code from Issue 509, thanks!).  See TestFieldsReader
for examples
> I added an expert method on IndexInput  named skipChars that takes in the number of characters
to skip.  This is a compromise on changing the file format of the fields to better support
seeking.  It does some of the work of readChars, but not all of it.  It doesn't require buffer
storage and it doesn't do the bitwise operations.  It just reads in the appropriate number
of bytes and promptly ignores them.  This is useful for skipping non-binary, non-compressed
stored fields.
> The biggest change is by far the introduction of the Fieldable interface (as per Doug's
suggestion from a mailing list email on Lazy Field loading from a while ago).  Field is now
a Fieldable.  All uses of Field have been changed to use Fieldable.  FieldsReader.LazyField
also implements Fieldable.
> Lazy Field loading is now implemented.  It has a major caveat (that is Documented) in
that it clones the underlying IndexInput upon lazy access to the Field value.  IT IS UNDEFINED
whether a Lazy Field can be loaded after the IndexInput parent has been closed (although,
from what I saw, it does work).  I thought about adding a reattach method, but it seems just
as easy to reload the document.  See the TestFieldsReader and DocHelper for examples.
> I updated a couple of other tests to reflect the new fields that are on the DocHelper
document.
> All tests pass.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message