lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <>
Subject [jira] Commented: (LUCENE-1044) Behavior on hard power shutdown
Date Wed, 28 Nov 2007 17:47:43 GMT


Michael McCandless commented on LUCENE-1044:

When autoCommit is true, then we should periodically commit automatically. When autoCommit
is false, then nothing should be committed until the IndexWriter is closed. The ambiguous
case is flush(). I think the reason for exposing flush() was to permit folks to commit without
closing, so I think flush() should commit too, but we could add a separate commit() method
that flushes and commits.

I think deprecating flush(), renaming it to commit(), and clarifying
the semantics to mean that commit() flushes pending docs/deletes,
commits a new segments_N, syncs all files referenced by this commit,
and blocks until the sync is complete, would make sense?  And,
commit() would in fact commit even when autoCommit is false (flush()
doesn't commit now when autoCommit=false, which is indeed confusing).

Perhaps the semantics of autoCommit=true should be altered so that it commits less than every
flush. Is that what you were proposing? If so, then I think it's a good solution. Prior to
2.2 the commit semantics were poorly defined. Folks were encouraged to close() their IndexWriter
to persist changes, and that's about all we said. 2.2's docs say that things are committed
at every flush, but there was no sync, so I don't think changing this could break any applications.

So I'm +1 for changing autoCommit=true to sync less than every flush, e.g., only after merges.
I'd also argue that we should be vague in the documentation about precisely when autoCommit=true
commits. If someone needs to know exactly when things are committed then they should be encouraged
to explicitly flush(), not to rely on autoCommit.

OK, I will test the "sync only when committing a merge" approach for
performance.  Hopefully a foreground sync() is fine given that with
ConcurrentMergePolicy that's already in a background thread.  This
would be a nice simplification.

And I agree we should be vague about, and users should never rely on,
precisely when Lucene has really committed (sync'd) the changes to
disk.  I'll fix the javadocs.

> Behavior on hard power shutdown
> -------------------------------
>                 Key: LUCENE-1044
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>         Environment: Windows Server 2003, Standard Edition, Sun Hotspot Java 1.5
>            Reporter: venkat rangan
>            Assignee: Michael McCandless
>             Fix For: 2.3
>         Attachments:, LUCENE-1044.patch, LUCENE-1044.take2.patch,
LUCENE-1044.take3.patch, LUCENE-1044.take4.patch
> When indexing a large number of documents, upon a hard power failure  (e.g. pull the
power cord), the index seems to get corrupted. We start a Java application as an Windows Service,
and feed it documents. In some cases (after an index size of 1.7GB, with 30-40 index segment
.cfs files) , the following is observed.
> The 'segments' file contains only zeros. Its size is 265 bytes - all bytes are zeros.
> The 'deleted' file also contains only zeros. Its size is 85 bytes - all bytes are zeros.
> Before corruption, the segments file and deleted file appear to be correct. After this
corruption, the index is corrupted and lost.
> This is a problem observed in Lucene 1.4.3. We are not able to upgrade our customer deployments
to 1.9 or later version, but would be happy to back-port a patch, if the patch is small enough
and if this problem is already solved.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message