lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-702) Disk full during addIndexes(Directory[]) can corrupt index
Date Tue, 12 Dec 2006 12:52:22 GMT
    [ http://issues.apache.org/jira/browse/LUCENE-702?page=comments#action_12457691 ] 
            
Michael McCandless commented on LUCENE-702:
-------------------------------------------

Thanks for the review Ning!

> 1 In the two rollbacks in mergeSegments (where inTransaction is
> false), the segmentInfos' generation is not always rolled back. So
> something like this could happen: two consecutive successful commits
> write segments_3 and segments_5, respectively. Nothing is broken,
> but it'd be nice to roll back completely (even for the IndexWriter
> instance) when a commit fails.

This is actually intentional: I don't want to write to the same
segments_N filename, ever, on the possibility that a reader may be
reading it.  Admittedly, this should be quite rare (filling up disk
and then experiencing contention, only on Windows), but still I wanted
to keep "write once" even in this case.

> 2 Code serving two purposes are (and has been) mixed in
> mergeSegments: one to merge segments and create compound file if
> necessary, the other to commit or roll back when inTransaction is
> false. It'd be nice if the two could be separated: optimize and
> maybeMergeSegments call mergeSegmentsAndCommit, which creates a
> transaction, calls mergeSegments and commits or rolls back;
> mergeSegments doesn't deal with commit or rollback. However,
> currently the non-CFS version is committed first even if
> useCompoundFile is true. Until that's changed, mergeSegments
> probably has to continue serving both purposes.

I agree, mergeSegments is doing two different things now: merging
segments, and, committing this change (in 2 steps, for the CFS case)
into the index (if not in a transaction).

I had in fact tried putting a transaction up at the
optimize/maybeMergeSegments level, and it worked, but there was one
severe drawback: the max temp free space required would be [up to] 2X
the starting size of the segments-to-be-merged because the original
segments would be there (1X), the newly merged separate segment files
would be there (another 1X) and the just-created CFS segment file
would also be there (another 1X).

Whereas the max temp space required now is 1X the starting size of
segments-to-be-merged.

So I think this (doing 2 commits with the current source before my
patch) is intentional, to keep the temp free space required at 1X.

It's also [relatively] OK to commit that first time, at least
functionally: the index is consistent and has all docs.  The only
downside is that if you hit disk full or other exception when creating
the CFS file then your index has one segment not in compound format (I
will call this out in the javadocs).

OK I've added a unit-test that explicitly tests max temp space
required by just optimize and asserts that it's at most 1X starting
index size.  Will attach a patch shortly!


> Disk full during addIndexes(Directory[]) can corrupt index
> ----------------------------------------------------------
>
>                 Key: LUCENE-702
>                 URL: http://issues.apache.org/jira/browse/LUCENE-702
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 2.1
>            Reporter: Michael McCandless
>         Assigned To: Michael McCandless
>         Attachments: LUCENE-702.patch
>
>
> This is a spinoff of LUCENE-555
> If the disk fills up during this call then the committed segments file can reference
segments that were not written.  Then the whole index becomes unusable.
> Does anyone know of any other cases where disk full could corrupt the index?
> I think disk full should worse lose the documents that were "in flight" at the time.
 It shouldn't corrupt the index.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message