lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chuck Williams <ch...@manawiz.com>
Subject Re: After kill -9 index was corrupt
Date Fri, 29 Sep 2006 23:37:29 GMT
Hi All,

I found this issue.  There is no problem in Lucene, and I'd like to
leave this thread with that assertion to avoid confusing future archive
searcher/readers.

The index was actually not corrupt at all.  I use ParallelReader and
ParallelWriter.  A kill -9 can leave the subindexes out of sync.  My
recovery code repairs this on restart by noticing the indexes are
out-of-sync, deleting the document(s) that were added to some
subindex(es) but not the other(s), then optimizing to resync the doc-ids.

The issue is that my bulk updater does not at present support compound
file format and the recovery code forgot to turn that off prior to the
optimize!  Thus a .cfs file was created, which confused the bulk updater
-- it did not see a segment that was inside the cfs.

Sorry for the false alarm and thanks to all who helped with the original
question/concern,

Chuck


Chuck Williams wrote on 09/11/2006 12:10 PM:
> I do have one module that does custom index operations.  This is my bulk
> updater.  It creates new index files for the segments it modifies and a
> new segments file, then uses the same commit mechanism as merging. 
> I.e., it copes its new segments file into "segments" with the commit
> lock only after all the new index files are closed.  In the problem
> scenario, I don't have any indication that the bulk updater was
> complicit but am of course fully exploring that possibility as well.
>
> The index was only reopened by the process after the kill -9 of the old
> process was completed, so there were not any threads still working on
> the old process.
>
> This remains a mystery.  Thanks for you analysis and suggestions.  If
> you have more ideas, please keep them coming!
>
> Chuck
>
>
> robert engels wrote on 09/11/2006 10:06 AM:
>   
>> I am not stating that you did not uncover a problem. I am only stating
>> that it is not due to OS level caching.
>>
>> Maybe your sequence of events triggered a reread of the index, while
>> some thread was still writing. The reread sees the 'unused segments'
>> and deletes them, and then the other thread writes the updated
>> 'segments' file.
>>
>> From what you state, it seems that you are using some custom code for
>> index writing? (Maybe the NewIndexModified stuff)? Possibly there is
>> an issue there. Do you maybe have your own cleanup code that attempts
>> to remove unused segments from the directory? If so, that appears to
>> be the likely culprit to me.
>>
>> On Sep 11, 2006, at 2:56 PM, Chuck Williams wrote:
>>
>>     
>>> robert engels wrote on 09/11/2006 07:34 AM:
>>>       
>>>> A kill -9 should not affect the OS's writing of dirty buffers
>>>> (including directory modifications). If this were the case, massive
>>>> system corruption would almost always occur every time a kill -9 was
>>>> used with any program.
>>>>
>>>> The only thing a kill -9 affects is user level buffering. The OS
>>>> always maintains a consistent view of directory modifications and or
>>>> file modification that were requesting by programs.
>>>>
>>>> This entire discussion is pointless.
>>>>
>>>>         
>>> Thanks everyone for your analysis.  It appears I do not have any
>>> explanation.  In my case, the process was in gc-limbo due to the memory
>>> leak and having butted up against its -Xmx.  The process was kill -9'd
>>> and then restarted.  The OS never crashed.  The server this is on is
>>> healthy; it has been used continually since this happened without being
>>> rebooted and no file system or any other issues.  When the process was
>>> killed, one thread was merging segments as part of flushing the ram
>>> buffer while closing the index, due to the prior kill -15.  When Lucene
>>> restarted, the segments file contained a segment name for which there
>>> were no corresponding index data files.
>>>
>>> Chuck
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>
>>>       
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>>     
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>   


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message