lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrzej Bialecki (Commented) (JIRA)" <>
Subject [jira] [Commented] (LUCENE-2621) Extend Codec to handle also stored fields and term vectors
Date Wed, 05 Oct 2011 12:32:34 GMT


Andrzej Bialecki  commented on LUCENE-2621:

CodecProvider -> Codec and Codec -> FieldCodec makes sense to me. This way the Codec
would be responsible for all global index parts (seginfos, fieldinfos), and it would provide
API to manage per-field data (FieldCodec), stored fields (FieldsWriter/Reader, perhaps a class
to tie these two together), and term vectors (TermVectorsWriter/Reader, again grouped in a

Re. current patch: this looks like a great start. I discovered a problem lurking in SegmentMerger.mergeFields.
setMatchingSegmentReaders checks only fieldInfos for compatibility, but it doesn't check the
codecs (and fieldWriter/fieldReader) compatibility. It's happy then to use the matchingSegmentReader
directly, which results in raw documents encoded with one codec being sent as a raw stream
to a fieldWriter from another codec.

Also, SegmentInfo.files() is messy, it should be populated from codecs - as soon as I changed
the extension of the stored fields file things exploded because TieredMergePolicy couldn't
find the .fdx file reported by files().
> Extend Codec to handle also stored fields and term vectors
> ----------------------------------------------------------
>                 Key: LUCENE-2621
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/index
>    Affects Versions: 4.0
>            Reporter: Andrzej Bialecki 
>            Assignee: Robert Muir
>              Labels: gsoc2011, lucene-gsoc-11, mentor
>         Attachments: LUCENE-2621_rote.patch
> Currently Codec API handles only writing/reading of term-related data, while stored fields
data and term frequency vector data writing/reading is handled elsewhere.
> I propose to extend the Codec API to handle this data as well.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message