lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: detected corrupted index / performance improvement
Date Wed, 06 Feb 2008 22:42:27 GMT

robert engels wrote:

> Do we have any way of determining if a segment is definitely OK/ 
> VALID ?

The only way I know is the CheckIndex tool, and it's rather slow (and
it's not clear that it always catches all corruption).

> If so, a much more efficient transactional system could be developed.
>
> Serialize the updates to a log file. Sync the log. Update the  
> lucene index WITHOUT any sync.  Log file writing/sync is VERY  
> efficient since it is sequential, and a single file.
>
> Upon open of the index, detect if index was not shutdown cleanly.  
> If so, determine the last valid segment, delete the bad segments,  
> and then perform the updates (from the log file) since the last  
> valid segment was written.
>
> The detection could be a VERY slow operation, but this is ok, since  
> it should be rare, and then you will only pay this price on the  
> rare occasion, not on every update.

Wouldn't you still need to sync periodically, so you can prune the
transaction log?  Else your transaction log is growing as fast as the
index?  (You've doubled disk usage).

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message