lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (Updated) (JIRA)" <>
Subject [jira] [Updated] (LUCENE-2621) Extend Codec to handle also stored fields and term vectors
Date Tue, 15 Nov 2011 21:19:51 GMT


Robert Muir updated LUCENE-2621:

    Attachment: LUCENE-2621_tv_fi_si.patch

Attached is a new patch between trunk and branch. I think its at a point ready for merging.
* term vectors and fieldinfos move to codec.
* segmentinfos is moved to codec (before you could only realistically tweak a few things).
* term vectors are cut over to flex apis
* much better testing of term vectors in checkindex.
* added simpletext impls of term vectors, fieldinfos, and segmentinfos.

After this I would propose closing this issue and opening followup issues for:
* make a new more efficient term vector implementation for 4.0, the existing one would go
to preflex, and preflex impl should reorder the terms correctly to UTF8 order (this is a bug
all along in trunk, not caused here!)
* see if we can remove the global .fnx file completely, as its not per-segment and i'm not
sure its totally necessary, perhaps the field number consistency can be achieved with another
mechanism. Otherwise, we should add a codec hack/hook at least so that preflexRW can write
segments without .fnx files.
* make preflex implementations of the other various reader/writers so that our 4.0 impls are
clean and don't contain backwards compatibility code, and so that we have more realistic testing
of backwards with PreFlexRW.
* allow adding offsets to the postings lists impls either startOffset/endOffset() or via attribute
like term vectors do in this patch, so that a D&Penum can retrieve the offsets at a position.
this could make highlighting much faster without having to use vectors.
* try to make a few other things like deletes extendable via codec
* figure out a good design to cut over norms to DocValues.
* add a SimpleTextDocValues, its sorely needed.
> Extend Codec to handle also stored fields and term vectors
> ----------------------------------------------------------
>                 Key: LUCENE-2621
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/index
>    Affects Versions: 4.0
>            Reporter: Andrzej Bialecki 
>            Assignee: Robert Muir
>              Labels: gsoc2011, lucene-gsoc-11, mentor
>         Attachments: LUCENE-2621.patch, LUCENE-2621_rote.patch, LUCENE-2621_tv_fi_si.patch
> Currently Codec API handles only writing/reading of term-related data, while stored fields
data and term frequency vector data writing/reading is handled elsewhere.
> I propose to extend the Codec API to handle this data as well.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message