lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chuck Williams <ch...@manawiz.com>
Subject Re: After kill -9 index was corrupt
Date Mon, 11 Sep 2006 22:10:08 GMT
I do have one module that does custom index operations.  This is my bulk
updater.  It creates new index files for the segments it modifies and a
new segments file, then uses the same commit mechanism as merging. 
I.e., it copes its new segments file into "segments" with the commit
lock only after all the new index files are closed.  In the problem
scenario, I don't have any indication that the bulk updater was
complicit but am of course fully exploring that possibility as well.

The index was only reopened by the process after the kill -9 of the old
process was completed, so there were not any threads still working on
the old process.

This remains a mystery.  Thanks for you analysis and suggestions.  If
you have more ideas, please keep them coming!

Chuck


robert engels wrote on 09/11/2006 10:06 AM:
> I am not stating that you did not uncover a problem. I am only stating
> that it is not due to OS level caching.
>
> Maybe your sequence of events triggered a reread of the index, while
> some thread was still writing. The reread sees the 'unused segments'
> and deletes them, and then the other thread writes the updated
> 'segments' file.
>
> From what you state, it seems that you are using some custom code for
> index writing? (Maybe the NewIndexModified stuff)? Possibly there is
> an issue there. Do you maybe have your own cleanup code that attempts
> to remove unused segments from the directory? If so, that appears to
> be the likely culprit to me.
>
> On Sep 11, 2006, at 2:56 PM, Chuck Williams wrote:
>
>> robert engels wrote on 09/11/2006 07:34 AM:
>>> A kill -9 should not affect the OS's writing of dirty buffers
>>> (including directory modifications). If this were the case, massive
>>> system corruption would almost always occur every time a kill -9 was
>>> used with any program.
>>>
>>> The only thing a kill -9 affects is user level buffering. The OS
>>> always maintains a consistent view of directory modifications and or
>>> file modification that were requesting by programs.
>>>
>>> This entire discussion is pointless.
>>>
>> Thanks everyone for your analysis.  It appears I do not have any
>> explanation.  In my case, the process was in gc-limbo due to the memory
>> leak and having butted up against its -Xmx.  The process was kill -9'd
>> and then restarted.  The OS never crashed.  The server this is on is
>> healthy; it has been used continually since this happened without being
>> rebooted and no file system or any other issues.  When the process was
>> killed, one thread was merging segments as part of flushing the ram
>> buffer while closing the index, due to the prior kill -15.  When Lucene
>> restarted, the segments file contained a segment name for which there
>> were no corresponding index data files.
>>
>> Chuck
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message