lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From robert engels <reng...@ix.netcom.com>
Subject Re: detected corrupted index / performance improvement
Date Thu, 07 Feb 2008 16:18:40 GMT
I might be misunderstanding 1044.  There were several approaches, and  
I am not certain what was the final???

I reread the bug and am still a bit unclear.

If the segments are sync'd as part of the commit, then yes, that  
would suffice. The merges don't need to commit, you just can't delete  
the segments until the merge completes.

I  think that building the segments, and syncing each segment - since  
in most cases the caller is going to call commit as part of each  
update, is going to be slower than writing the documents/operations  
to a log file, but a lot depends on how Lucene is used (interactive  
vs. batch, lots of updates vs. a few).

I am not sure how deletions are impacted by all of this.


On Feb 7, 2008, at 9:21 AM, robert engels wrote:

> This is simply not true. Two different issues are at play. You  
> cannot have a true 'commit' unless it is synchronous!
>
> Lucene-1044 might allow the index to be brought back to a  
> consistent state, but not one that is consistent with a  
> synchronization point.
>
> For example, I write three documents to the index. I call commit.  
> It returns. After this, those documents MUST be in the index under  
> any conditions. Lucene 1044 does not ensure this.
>
> By writing the operations (deletes and updates) to a log file  
> first, and syncing the log file, then a failure during the index  
> writing/merging can be fixed by rolling forward the log.
>
>
> On Feb 7, 2008, at 4:29 AM, Michael McCandless wrote:
>
>>
>> In fact this is exactly the approach in the final patch on  
>> LUCENE-1044 and it gives far better performance than the simply  
>> synchronous (original) approach of syncing every segment file on  
>> close.
>>
>> Using a transaction log would also require periodic syncing.
>>
>> LUCENE-1044 syncs files after every merge, in the background  
>> thread of ConcurrentMergeScheduler, which is nice because it does  
>> not block further add/update/deleteDocument calls on the writer.
>>
>> Mike
>>
>> Andrew Zhang wrote:
>>
>>> On Feb 7, 2008 7:22 AM, robert engels <rengels@ix.netcom.com> wrote:
>>>
>>>> That doesn't help, with lazy writing/buffering by the OS, there  
>>>> is no
>>>> guarantee that if the last written block is ok, that earlier blocks
>>>> in the file are....
>>>>
>>>> The OS/drive is going to physically write them in the most  
>>>> efficient
>>>> manner. Only after a sync would this hold true (which is what we  
>>>> are
>>>> trying to avoid).
>>>
>>>
>>> Hi, how about asynchronous commit? i.e. use a thread to sync the  
>>> data.
>>>
>>> We only need to make sure that all data are written to the  
>>> storage before
>>> the next operation?
>>>
>>>>
>>>>
>>>> On Feb 6, 2008, at 5:15 PM, DM Smith wrote:
>>>>
>>>>>
>>>>> On Feb 6, 2008, at 5:42 PM, Michael McCandless wrote:
>>>>>
>>>>>>
>>>>>> robert engels wrote:
>>>>>>
>>>>>>> Do we have any way of determining if a segment is definitely
OK/
>>>>>>> VALID ?
>>>>>>
>>>>>> The only way I know is the CheckIndex tool, and it's rather  
>>>>>> slow (and
>>>>>> it's not clear that it always catches all corruption).
>>>>>
>>>>> Just a thought. It seems that the discussion has revolved around
>>>>> whether a crash or similar event has left the file in an
>>>>> inconsistent state. Without looking into how it is actually done,
>>>>> I'm going to guess that the writing is done from the start of the
>>>>> file to its end. That is, no "out of order" writing.
>>>>>
>>>>> If this is the case, how about adding a marker to the end of the
>>>>> file of a known size and pattern. If it is present then it is
>>>>> presumed that there were no errors in getting to that point.
>>>>>
>>>>> Even with out of order writing, one could write an 'INVALID'  
>>>>> marker
>>>>> at the beginning of the operation and then upon reaching the  
>>>>> end of
>>>>> the writing, replace it with the valid marker.
>>>>>
>>>>> If neither marker is found then the index is one from before the
>>>>> capability was added and nothing can be said about the validity.
>>>>>
>>>>> -- DM
>>>>>
>>>>> ------------------------------------------------------------------ 
>>>>> ---
>>>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>>>
>>>>
>>>>
>>>> ------------------------------------------------------------------- 
>>>> --
>>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>>
>>>>
>>>
>>>
>>> -- 
>>> Best regards,
>>> Andrew Zhang
>>>
>>> db4o - database for Android: www.db4o.com
>>> http://zhanghuangzhu.blogspot.com/
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message