lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Simon Willnauer (JIRA)" <>
Subject [jira] Commented: (LUCENE-2621) Extend Codec to handle also stored fields and term vectors
Date Mon, 14 Mar 2011 14:50:29 GMT


Simon Willnauer commented on LUCENE-2621:

bq. When you mentioned Codec API, do you mean the abstract class org.apache.lucene.index.codecs.Codec?
Yes that is the main entry point. Currently a codec offer a FieldsConsumer which is pulled
by the IndexWriter upon a flush request. Codecs are assigned per field and segment via the
CodecProvider. So each field can have its own codec and each codec can have a different implementation.
Yet, currently we only provide codec support for the reverse index so a codec can customize
the term dictionary (TermsEnum would be the API counterpart) and posting lists (DocsEnum /
DocsAndPositionsEnum in the API). What this issue tries to do is to open up this API as a
general low level customization layer that enables users to also customize how Stored Fields
and TermVectors are stored on disk.

 bq. Term vectors refer to org.apache.lucene.index.TermFreqVector, and it is processed by
TermVectorsWriter now, correct?
yes thats true.

bq. But what are the stored fields? I cannot find them immediately.
there should be a StoredFieldsWriter and a FieldsReader.

bq. BTW, is there any design document of Lucene in the Wiki?

nothing that I would call a design document. there are some pages which could be similar to
what you are looking for but those might be out of date. You should maybe look int the corresponding
issues to find design decisions.

> Extend Codec to handle also stored fields and term vectors
> ----------------------------------------------------------
>                 Key: LUCENE-2621
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: 4.0
>            Reporter: Andrzej Bialecki 
>              Labels: gsoc2011,, lucene-gsoc-11, mentor,
> Currently Codec API handles only writing/reading of term-related data, while stored fields
data and term frequency vector data writing/reading is handled elsewhere.
> I propose to extend the Codec API to handle this data as well.

This message is automatically generated by JIRA.
For more information on JIRA, see:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message