lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Miller <>
Subject Re: detected corrupted index / performance improvement
Date Wed, 06 Feb 2008 23:42:36 GMT
Hey DM,

Just to recap an earlier thread, you need the sync and you need hardware 
that doesn't lie to you about the result of the sync.

Here is an excerpt about Digg running into that issue:

"They had problems with their storage system telling them writes were on 
disk when they really weren't. Controllers do this to improve the 
appearance of their performance. But what it does is leave a giant data 
integrity whole in failure scenarios. This is really a pretty common 
problem and can be hard to fix, depending on your hardware setup."

There is a lot of good stuff relating to this in the discussion 
surrounding the JIRA issue.

robert engels wrote:
> That doesn't help, with lazy writing/buffering by the OS, there is no 
> guarantee that if the last written block is ok, that earlier blocks in 
> the file are....
> The OS/drive is going to physically write them in the most efficient 
> manner. Only after a sync would this hold true (which is what we are 
> trying to avoid).
> On Feb 6, 2008, at 5:15 PM, DM Smith wrote:
>> On Feb 6, 2008, at 5:42 PM, Michael McCandless wrote:
>>> robert engels wrote:
>>>> Do we have any way of determining if a segment is definitely 
>>>> OK/VALID ?
>>> The only way I know is the CheckIndex tool, and it's rather slow (and
>>> it's not clear that it always catches all corruption).
>> Just a thought. It seems that the discussion has revolved around 
>> whether a crash or similar event has left the file in an inconsistent 
>> state. Without looking into how it is actually done, I'm going to 
>> guess that the writing is done from the start of the file to its end. 
>> That is, no "out of order" writing.
>> If this is the case, how about adding a marker to the end of the file 
>> of a known size and pattern. If it is present then it is presumed 
>> that there were no errors in getting to that point.
>> Even with out of order writing, one could write an 'INVALID' marker 
>> at the beginning of the operation and then upon reaching the end of 
>> the writing, replace it with the valid marker.
>> If neither marker is found then the index is one from before the 
>> capability was added and nothing can be said about the validity.
>> -- DM
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail:
>> For additional commands, e-mail:
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message