lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doug Cutting (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-1044) Behavior on hard power shutdown
Date Wed, 28 Nov 2007 17:20:43 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-1044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12546306
] 

Doug Cutting commented on LUCENE-1044:
--------------------------------------

> But must every "automatic buffer flush" by IndexWriter really be a
"permanent commit"?

When autoCommit is true, then we should periodically commit automatically.  When autoCommit
is false, then nothing should be committed until the IndexWriter is closed.  The ambiguous
case is flush().  I think the reason for exposing flush() was to permit folks to commit without
closing, so I think flush() should commit too, but we could add a separate commit() method
that flushes and commits.

> People who upgrade will suddenly get much worse performance.

Yes, that would be bad.  Perhaps the semantics of autoCommit=true should be altered so that
it commits less than every flush.  Is that what you were proposing?  If so, then I think it's
a good solution.  Prior to 2.2 the commit semantics were poorly defined.  Folks were encouraged
to close() their IndexWriter to persist changes, and that's about all we said.  2.2's docs
say that things are committed at every flush, but there was no sync, so I don't think changing
this could break any applications.

So I'm +1 for changing autoCommit=true to sync less than every flush, e.g., only after merges.
 I'd also argue that we should be vague in the documentation about precisely when autoCommit=true
commits.  If someone needs to know exactly when things are committed then they should be encouraged
to explicitly flush(), not to rely on autoCommit.

> Behavior on hard power shutdown
> -------------------------------
>
>                 Key: LUCENE-1044
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1044
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>         Environment: Windows Server 2003, Standard Edition, Sun Hotspot Java 1.5
>            Reporter: venkat rangan
>            Assignee: Michael McCandless
>             Fix For: 2.3
>
>         Attachments: FSyncPerfTest.java, LUCENE-1044.patch, LUCENE-1044.take2.patch,
LUCENE-1044.take3.patch, LUCENE-1044.take4.patch
>
>
> When indexing a large number of documents, upon a hard power failure  (e.g. pull the
power cord), the index seems to get corrupted. We start a Java application as an Windows Service,
and feed it documents. In some cases (after an index size of 1.7GB, with 30-40 index segment
.cfs files) , the following is observed.
> The 'segments' file contains only zeros. Its size is 265 bytes - all bytes are zeros.
> The 'deleted' file also contains only zeros. Its size is 85 bytes - all bytes are zeros.
> Before corruption, the segments file and deleted file appear to be correct. After this
corruption, the index is corrupted and lost.
> This is a problem observed in Lucene 1.4.3. We are not able to upgrade our customer deployments
to 1.9 or later version, but would be happy to back-port a patch, if the patch is small enough
and if this problem is already solved.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message