lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-1044) Behavior on hard power shutdown
Date Sat, 10 Nov 2007 19:05:50 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-1044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12541576
] 

Michael McCandless commented on LUCENE-1044:
--------------------------------------------


{quote}
Was that compound or non-compound index format?  I imagine
non-compound will take a bigger hit since each file will be
synchronized separately and in a serialized fashion.
{quote}

The test was with compound file.

But, the close() on each component file that goes into the compound
file also does a sync, so compound file would be a slightly bigger hit
because it has one additional sync()?

We can't safely remove the sync() on each component file before
building the compound file because we currently do a commit of the new
segments file before building the compound file.

I guess we could revisit whether that commit (before building the
compound file) is really necessary?  I think it's there from when
flushing & merging were the same thing, and you do want to do this
when merging to save 1X extra peak on the disk usage, but now that
flushing is separate from merging we could remove that intermediate
commit?

{quote}
I also imagine that the hit will be larger for a weaker disk
subsystem, and for usage patterns that continually add a few docs and
close?
{quote}

OK I'll run the same test, but once on a laptop and once over NFS to
see what the cost is for those cases.

Yes, continually adding docs & flushing/closing your writer will in
theory be most affected here.  I think for such apps performance is
not usually top priority (indexing latency is)?  Ie if you wanted
performance you would batch up the added docs more?  Anyway, for such
cases users can turn off sync() if they want to risk it?

{quote}
Is a sync before every file close really needed, or can some of them
be avoided when autocommit==false?
{quote}

It's somewhat tricky to safely remove sync() even when
autoCommit=false, because you don't know at close() whether this file
you are closing will be referenced (and not merged away) when the
commit is finally done (when IndexWriter is closed).

If there were a way to sync a file after having closed it (is there?)
then we could go and sync() all new files we had created that are now
referenced by the segments file we are writing.

Also, I was thinking we could start simple (call sync() before every
close()) and then with time, and if necessary, work out smarter ways
to safely remove some of those sync()'s.

{quote}
Also, the 'sync' should be optional. BerkleyDB offers similar
functionality.
{quote}

It is optional: I added doSync boolean to
FSDirectory.getDirectory(...).

And, I agree: for cases where there is very low cost to regenerate the
index, and you want absolute best performance, you can turn off
syncing.

> Behavior on hard power shutdown
> -------------------------------
>
>                 Key: LUCENE-1044
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1044
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>         Environment: Windows Server 2003, Standard Edition, Sun Hotspot Java 1.5
>            Reporter: venkat rangan
>            Assignee: Michael McCandless
>             Fix For: 2.3
>
>         Attachments: LUCENE-1044.patch, LUCENE-1044.take2.patch, LUCENE-1044.take3.patch
>
>
> When indexing a large number of documents, upon a hard power failure  (e.g. pull the
power cord), the index seems to get corrupted. We start a Java application as an Windows Service,
and feed it documents. In some cases (after an index size of 1.7GB, with 30-40 index segment
.cfs files) , the following is observed.
> The 'segments' file contains only zeros. Its size is 265 bytes - all bytes are zeros.
> The 'deleted' file also contains only zeros. Its size is 85 bytes - all bytes are zeros.
> Before corruption, the segments file and deleted file appear to be correct. After this
corruption, the index is corrupted and lost.
> This is a problem observed in Lucene 1.4.3. We are not able to upgrade our customer deployments
to 1.9 or later version, but would be happy to back-port a patch, if the patch is small enough
and if this problem is already solved.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message