lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dmitry Serebrennikov <>
Subject Re: Making Lucene Transactional
Date Fri, 28 Jun 2002 16:45:15 GMT
>>>-----Original Message-----
>>>From: Doug Cutting []
>>>Sent: Thursday, June 27, 2002 10:36 AM
>>>To: Lucene Users List
>>>Subject: Re: Stress Testing Lucene
>>>It's very hard to leave an index in a bad state.  Updating the 
>>>"segments" file atomically updates the index.  So the only way to 
>>>corrupt things is to only partly update the segments file.  
>>>But that too 
>>>is hard, since it's first written to a temporary file, which is then 
>>>renamed "segments".  
We could further protect against this one by writing a checksum of some 
sort at the end of the segments file and then re-reading it and 
verifying the checksum before renaming the temporary segments file to 
"segments". This way we'll know that only fully written segments files 
are made active.
The checksum can also be used to verify integrity of the other index 
segment components. I guess there is always a chance that the disk 
driver is caching the writes.

>>>The only vulnerability I know if is that 
>>>in Java on 
>>>Win32 you can't atomically rename a file to something that already 
>>>exists, so Lucene has to first remove the old version.  So if 
>>>you were 
>>>to crash between the time that the old version of "segments" 
>>>is removed 
>>>and the new version is moved into place, then the index would be 
>>>corrupt, because it would have no "segments" file.
Perhaps we could also protect against this one by simply removing the 
old segments file (is that atomic by itself?) and then letting the next 
IndexReader look for the temporary file when it sees that there is no 
"segments" file and rename it. There might be a case where two competing 
IndexReaders do the "segments" file check at the same time, find that it 
is not there, go after the "segments.tmp" and try to rename it. But in 
this case only the first one will succeed and the following one will 
find that the "segments.tmp" is no longer there (or that another 
"segments" file already exists), in which case it should look for the 
"segments" file again and proceed.

Would these two changes make the index at least as reliable as the disk 

>To unsubscribe, e-mail:   <>
>For additional commands, e-mail: <>

To unsubscribe, e-mail:   <>
For additional commands, e-mail: <>

View raw message