lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <>
Subject Re: CheckIndex tool
Date Tue, 05 Aug 2008 15:00:10 GMT

Actually, those exceptions are thrown by the code detecting the  
mismatch, and then caught by CheckIndex and handled as meaning that  
segment is corrupt.  This is consistent eg with how Lucene throws  
CorruptIndexException deep down if it hits an inconsistency.

I think it's fine if you want to not use exceptions for the "local"  
mismatches, and instead record the error in a data structure and then  
stop processing that one segment.  But for the "deep down" exceptions  
you still have to keep the catch in CheckIndex to record those.


On Aug 5, 2008, at 9:30 AM, Grant Ingersoll wrote:

> I'll look into these.  The other parts I am not sure on is the  
> throwing of exceptions for mismatches.  I know they mean CheckIndex  
> can't go forward, but they aren't really errors in CheckIndex, so  
> much as errors in the index, which CheckIndex is just reporting.   
> So, I'm inclined to capture that and present it (and return  
> immediately) instead of throw an exception.  Is that reasonable?
> -Grant
> On Aug 4, 2008, at 5:01 PM, Michael McCandless wrote:
>> This sounds good!  I like the idea of checking the index when Solr  
>> has to force release the write.lock.
>> The one caveat is, when checking a large index (which can take  
>> quite some time), it'd be nice to have the equivalent of the  
>> inline'd out.print/ln calls happen in realtime so that you can see  
>> (on the command line output) that progress is being made, which  
>> segment is being checked, etc.?
>> Maybe change it to an optional "infoStream" (like IndexWriter), and  
>> then the current inlined prints become calls to message() which  
>> checks if infoStream is non-null?
>> Mike
>> Grant Ingersoll wrote:
>>> Hey Mike,
>>> I'm thinking about  
>>> and was thinking about adding some more programmatic access to the  
>>> CheckIndex tool and wanted to see if you had any thoughts.   
>>> Basically, I am going to to capture info into a simple data  
>>> structure that can then be introspected and serialized into a  
>>> RequestHandler, but also something that might be more generally  
>>> useful in certain cases where things go bad.  I was debating  
>>> keeping the inline out.printlns, but not sure if they shouldn't  
>>> just be moved to the main such that the cmd line stuff still works  
>>> as is, but it doesn't clog the logs for those that want  
>>> programmatic access.
>>> I'll post a patch soon, but wanted to see if you had any  
>>> preliminary insight.
>>> -Grant
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail:
>>> For additional commands, e-mail:
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail:
>> For additional commands, e-mail:
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message