lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrzej Bialecki (JIRA)" <>
Subject [jira] Commented: (LUCENE-2373) Create a Codec to work with streaming and append-only filesystems
Date Thu, 01 Jul 2010 14:22:49 GMT


Andrzej Bialecki  commented on LUCENE-2373:

bq. Probably the codec could return a (private to it) subclass of SegmentInfo to hold such
extra info... 

Nice idea, I didn't think about this - yes, this should be possible now.

bq. Maybe we should provide default impls for CodecProvider.getSegmentInfosReader/Writer?
(Ie returning the Default impls)

DefaultCodecProvider does exactly this. Or do you mean instead of using abstract methods in

bq. Also, should we factor out the "leave space for index pointer" (out.writeLong(0)) to the
subclass? (And, the reading of that dirOffset). Because this is wasted now for the appending

The reading is already factored out, but the writing ... Well, it's just 8 bytes per segment
... the reason I didn't factor it out is that it would require additional before/after delegation,
or a replication of larger sections of code...

> Create a Codec to work with streaming and append-only filesystems
> -----------------------------------------------------------------
>                 Key: LUCENE-2373
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Andrzej Bialecki 
>             Fix For: 4.0
>         Attachments: appending.patch
> Since early 2.x times Lucene used a skip/seek/write trick to patch the length of the
terms dict into a place near the start of the output data file. This however made it impossible
to use Lucene with append-only filesystems such as HDFS.
> In the post-flex trunk the following code in StandardTermsDictWriter initiates this:
> {code}
>     // Count indexed fields up front
>     CodecUtil.writeHeader(out, CODEC_NAME, VERSION_CURRENT); 
>     out.writeLong(0);                             // leave space for end index pointer
> {code}
> and completes this in close():
> {code}
>       out.writeLong(dirStart);
> {code}
> I propose to change this layout so that this pointer is stored simply at the end of the
file. It's always 8 bytes long, and we known the final length of the file from Directory,
so it's a single additional seek(length - 8) to read it, which is not much considering the

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message