lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From robert engels <reng...@ix.netcom.com>
Subject Re: detected corrupted index / performance improvement
Date Thu, 07 Feb 2008 15:21:05 GMT
This is simply not true. Two different issues are at play. You cannot  
have a true 'commit' unless it is synchronous!

Lucene-1044 might allow the index to be brought back to a consistent  
state, but not one that is consistent with a synchronization point.

For example, I write three documents to the index. I call commit. It  
returns. After this, those documents MUST be in the index under any  
conditions. Lucene 1044 does not ensure this.

By writing the operations (deletes and updates) to a log file first,  
and syncing the log file, then a failure during the index writing/ 
merging can be fixed by rolling forward the log.


On Feb 7, 2008, at 4:29 AM, Michael McCandless wrote:

>
> In fact this is exactly the approach in the final patch on  
> LUCENE-1044 and it gives far better performance than the simply  
> synchronous (original) approach of syncing every segment file on  
> close.
>
> Using a transaction log would also require periodic syncing.
>
> LUCENE-1044 syncs files after every merge, in the background thread  
> of ConcurrentMergeScheduler, which is nice because it does not  
> block further add/update/deleteDocument calls on the writer.
>
> Mike
>
> Andrew Zhang wrote:
>
>> On Feb 7, 2008 7:22 AM, robert engels <rengels@ix.netcom.com> wrote:
>>
>>> That doesn't help, with lazy writing/buffering by the OS, there  
>>> is no
>>> guarantee that if the last written block is ok, that earlier blocks
>>> in the file are....
>>>
>>> The OS/drive is going to physically write them in the most efficient
>>> manner. Only after a sync would this hold true (which is what we are
>>> trying to avoid).
>>
>>
>> Hi, how about asynchronous commit? i.e. use a thread to sync the  
>> data.
>>
>> We only need to make sure that all data are written to the storage  
>> before
>> the next operation?
>>
>>>
>>>
>>> On Feb 6, 2008, at 5:15 PM, DM Smith wrote:
>>>
>>>>
>>>> On Feb 6, 2008, at 5:42 PM, Michael McCandless wrote:
>>>>
>>>>>
>>>>> robert engels wrote:
>>>>>
>>>>>> Do we have any way of determining if a segment is definitely OK/
>>>>>> VALID ?
>>>>>
>>>>> The only way I know is the CheckIndex tool, and it's rather  
>>>>> slow (and
>>>>> it's not clear that it always catches all corruption).
>>>>
>>>> Just a thought. It seems that the discussion has revolved around
>>>> whether a crash or similar event has left the file in an
>>>> inconsistent state. Without looking into how it is actually done,
>>>> I'm going to guess that the writing is done from the start of the
>>>> file to its end. That is, no "out of order" writing.
>>>>
>>>> If this is the case, how about adding a marker to the end of the
>>>> file of a known size and pattern. If it is present then it is
>>>> presumed that there were no errors in getting to that point.
>>>>
>>>> Even with out of order writing, one could write an 'INVALID' marker
>>>> at the beginning of the operation and then upon reaching the end of
>>>> the writing, replace it with the valid marker.
>>>>
>>>> If neither marker is found then the index is one from before the
>>>> capability was added and nothing can be said about the validity.
>>>>
>>>> -- DM
>>>>
>>>> ------------------------------------------------------------------- 
>>>> --
>>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>>
>>>
>>>
>>> -------------------------------------------------------------------- 
>>> -
>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>
>>>
>>
>>
>> -- 
>> Best regards,
>> Andrew Zhang
>>
>> db4o - database for Android: www.db4o.com
>> http://zhanghuangzhu.blogspot.com/
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message