lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <>
Subject Re: IndexingChain and TermHash
Date Mon, 16 Nov 2009 13:01:21 GMT
Yes, the branch is here:

Mark (Miller) periodically re-sync's it to trunk.

All tests should pass, and if you create a new Codec, please share the

There are not yet many Codecs in existence... the branch has the
"standard" codec (closest to Lucene's current index format, but makes
some compelling improvements to the terms dict), a "pulsing" codec
(which inlines low-freq terms into the terms dict), an intblock codec
(an abstract base for building int-block codecs).  There's also the
PForDelta codec, attached to LUCENE-1410, which subclasses the
intblock codec and uses PForDelta encoding.  It's probably best to
peek at these example codecs for inspiration on how to build yours.


On Mon, Nov 16, 2009 at 7:28 AM, Renaud Delbru <> wrote:
> Hi Michael,
> I see there is already a huge amount of work already done in LUCENE-1458. Is
> there a way to checkout the corresponding branch, and start to use it ? At
> least, to see if I can extend it and create my own Codec.
> I have started on my side to abstract the indexing chain of Lucene 2.9, in
> order to be able to plug my own chain, but I have the impression that you've
> done something similar already (with the codec abstraction). Would be a pity
> to lose my time doing something less convenient that your appraoch.
> Thanks.
> --
> Renaud Delbru
> On 14/11/09 13:22, Michael McCandless wrote:
>> On Fri, Nov 6, 2009 at 1:34 PM, Renaud Delbru<>
>>  wrote:
>>> Hi Michael,
>>> Thanks for the quick fix. I have tested it (indexing multiple documents +
>>> searching), and it seems to work.
>>> On 06/11/09 18:09, Michael McCandless wrote:
>>>> To be honest, you are sort of forging new territory here :)
>>> I think so too, not an easy task ;o). I have seen that you have tried to
>>> make modular the indexing chain of Lucene (DocumentsWriter). I still try
>>> to
>>> have a good understanding of the default indexing, but I would like to
>>> see
>>> how it is easy (or difficult) to modify the format of the postings. From
>>> my
>>> current understanding, it seems that only the consumer at the end of this
>>> chain (FreqProxTermsWriter and its consumer FormatPostingsFieldsWriter)
>>> has
>>> to be changed to a certain extend.
>> Right, those two classes do the writing of the postings, currently.
>> But with flexible indexing (LUCENE-1458), still in progress, we hope
>> to make it more easily pluggable, the codec that actually reads&
>> writes the postings.
>> Mike
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail:
>> For additional commands, e-mail:
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message