lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chuck Williams <ch...@manawiz.com>
Subject Re: After kill -9 index was corrupt
Date Mon, 11 Sep 2006 07:50:20 GMT


Paul Elschot wrote on 09/10/2006 09:15 PM:
> On Monday 11 September 2006 02:24, Chuck Williams wrote:
>   
>> Hi All,
>>
>> An application of ours under development had a memory link that caused
>> it to slow interminably.  On linux, the application did not response to
>> kill -15 in a reasonable time, so kill -9 was used to forcibly terminate
>> it.  After this the segments file contained a reference to a segment
>> whose index files were not present.  I.e., the index was corrupt and
>> Lucene could not open it.
>>
>> A thread dump at the time of the kill -9 shows that Lucene was merging
>> segments inside IndexWriter.close().  Since segment merging only commits
>> (updates the segments file) after the newly merged segment(s) are
>> complete, I expect this is not the actual problem.
>>
>> Could a kill -9 prevent data from reaching disk for files that were
>> previously closed?  If so, then Lucene's index can become corrupt after
>> kill -9.  In this case, it is possible that a prior merge created new
>> segment index files, updated the segments file, closed everything, the
>> segments file made it to disk, but the index data files and/or their
>> directory entries did not.
>>
>> If this is the case, it seems to me that flush() and
>> FileDescriptor.sync() are required on each index file prior to close()
>> to guarantee no corruption.  Additionally a FileDescriptor.sync() is
>> also probably required on the index directory to ensure the directory
>> entries have been persisted.
>>     
>
> Shouldn't the sync be done after closing the files? I'm using sync in a
> (un*x) shell script after merges before backups. I'd prefer to have some
> more of this syncing built into Lucene because the shell sync syncs all
> disks which might be more than needed. So far I've had no problems,
> so there was no need to investigate further.
>   
I believe FileDescriptor,sync() uses fsync and not sync on linux.  A
FileDescriptor is no longer valid after the stream is closed, so sync()
could not be done on a closed stream.  I think the correct protocol is
flush() the stream, sync() it's FD, then close() it.

Paul, do you know if kill -9 can create the situation where bytes from a
closed file never make it to disk in linux?  I think Lucene needs sync()
in any event to be robust with respect to OS crashes, but am wondering
if this explains my kill -9 problem as well.  It seems bogus to me that
a closed file's bytes would fail to be persisted unless the OS crashed,
but I can't find any other explanation and I can't find any definitive
information to affirm or refute this possible side effect of kill -9.

The issue I've got is that my index can never lose documents.  So I've
implemented journaling on top of Lucene where only the last
maxBufferedDocs documents are journaled and the whole journal is reset
after close().  My application has no way to know when the bytes make it
to disk, and so cannot manage its journal properly unless Lucene ensures
index integrity with sync()'s.

Chuck


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message