lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From robert engels <>
Subject Re: detected corrupted index / performance improvement
Date Wed, 06 Feb 2008 23:20:53 GMT
Yes, but this pruning could be more efficient. On a background  
thread, get current segment from segments file, call the system wide  
sync ( e.g. System.exec("fsync"), then you can purge the transaction  
logs for all segments up to that one. Since it is a background  
operation, you are not blocking the writing of new segments and tx logs.

On Feb 6, 2008, at 4:42 PM, Michael McCandless wrote:

> robert engels wrote:
>> Do we have any way of determining if a segment is definitely OK/ 
>> VALID ?
> The only way I know is the CheckIndex tool, and it's rather slow (and
> it's not clear that it always catches all corruption).
>> If so, a much more efficient transactional system could be developed.
>> Serialize the updates to a log file. Sync the log. Update the  
>> lucene index WITHOUT any sync.  Log file writing/sync is VERY  
>> efficient since it is sequential, and a single file.
>> Upon open of the index, detect if index was not shutdown cleanly.  
>> If so, determine the last valid segment, delete the bad segments,  
>> and then perform the updates (from the log file) since the last  
>> valid segment was written.
>> The detection could be a VERY slow operation, but this is ok,  
>> since it should be rare, and then you will only pay this price on  
>> the rare occasion, not on every update.
> Wouldn't you still need to sync periodically, so you can prune the
> transaction log?  Else your transaction log is growing as fast as the
> index?  (You've doubled disk usage).
> Mike
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message