lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-2491) Extend Codec with a SegmentInfos writer / reader
Date Mon, 07 Jun 2010 11:00:56 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-2491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12876200#action_12876200
] 

Michael McCandless commented on LUCENE-2491:
--------------------------------------------

This sounds great!

I've wanted to let Codecs store stuff into each SegmentInfo (eg the hasProx boolean really
ought not be a core thing but a Codec-private flag instead).  Maybe this is a way to do that...

The only odd thing is... Codec is per-segment now.  Every segment is free to have a different
Codec (even within a single session of IW).  So having Codec write the segments file doesn't
really "fit"; I guess CodecProvider could do so?

Multiple segments files can exist in the index at a time; the requirement would then be that
the current CodecProvider must always be able to read all segments files written by past CodecProviders.

We could alternatively make it an option for IW to use a normal IndexOutput when writing segments
files (skipping the checksum).

Once you remove this from HDFS, how will you ensure the written segments file is consistent?
 Or is this (a possibly partially written segments file due to eg OS crash or power loss,
on "ordinary" filesystems) never an issue with HDFS?

> Extend Codec with a SegmentInfos writer / reader
> ------------------------------------------------
>
>                 Key: LUCENE-2491
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2491
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: 4.0
>            Reporter: Andrzej Bialecki 
>
> I'm trying to implement a Codec that works with append-only filesystems (HDFS). It's
_almost_ done, except for the SegmentInfos.write(dir), which uses ChecksumIndexOutput, which
in turn uses IndexOutput.seek() - and seek is not supported on append-only output. I propose
to extend the Codec interface to encapsulate also the details of SegmentInfos writing / reading.
Patch to follow after some feedback ;)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message