lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Simon Willnauer (JIRA)" <j...@apache.org>
Subject [jira] Created: (LUCENE-2935) Let Codec consume entire document
Date Mon, 21 Feb 2011 18:31:38 GMT
Let Codec consume entire document
---------------------------------

                 Key: LUCENE-2935
                 URL: https://issues.apache.org/jira/browse/LUCENE-2935
             Project: Lucene - Java
          Issue Type: Improvement
          Components: Codecs, Index
    Affects Versions: CSF branch, 4.0
            Reporter: Simon Willnauer
            Assignee: Simon Willnauer
             Fix For: CSF branch, 4.0


Currently the codec API is limited to consume Terms & Postings upon a segment flush. To
enable stored fields & DocValues to make use of the Codec abstraction codecs should allow
to pull a consumer ahead of flush time and consume all values from a document's field though
a consumer API. An alternative to consuming the entire document would be extending FieldsConsumer
to return a StoredValueConsumer / DocValuesConsumer like it is done in DocValues - Branch
right now side by side to the TermsConsumer. Yet, extending this has proven to be very tricky
and error prone for several reasons:
* FieldsConsumer requires SegmentWriteState which might be different upon flush compared to
when the document is consumed. SegmentWriteState must therefor be created twice 1. when the
first docvalues field is indexed 2. when flushed. 
* FieldsConsumer are current pulled for each indexed field no matter if there are terms to
be indexed or not. Yet, if we use something like DocValuesCodec which essentially wraps another
codec and creates FieldConsumer on demand the wrapped codecs consumer might not be initialized
even if the field is indexed. This causes problems once such a field is opened but missing
the required files for that codec. I added some harsh logic to work around this which should
be prevented.
* SegmentCodecs are created for each SegmentWriteState which might yield wrong codec IDs depending
on how fields numbers are assigned. We currently depend on the fact that all fields for a
segment and therefore their codecs are known when SegmentCodecs are build. To enable consuming
perDoc values in codecs we need to do that incrementally

Codecs should instead provide a DocumentConsumer side by side with the FieldsConsumer created
prior to flush. This is also a prerequisite for LUCENE-2621

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message