lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From robert engels <>
Subject Re: [jira] Commented: (LUCENE-1044) Behavior on hard power shutdown
Date Fri, 30 Nov 2007 13:31:21 GMT
My reading of the Unix specification shows it should work (the  
_commit under Windows is less clear, and since Windows is not inode  
based, there may be different issues).

On Nov 30, 2007, at 7:10 AM, Michael McCandless (JIRA) wrote:

>     [ 
> page=com.atlassian.jira.plugin.system.issuetabpanels:comment- 
> tabpanel#action_12547122 ]
> Michael McCandless commented on LUCENE-1044:
> --------------------------------------------
> {quote}
> You could just queue the file names for sync, close them, and then  
> have the background thread open, sync and close them. The close  
> could trigger the OS to sync things faster in the background. Then  
> the open/sync/close could mostly be a no-op. Might be worth a try.
> {quote}
> I am taking this approach now, but one nagging question I have is: do
> we know with some certainty that re-opening a file and then sync'ing
> it in fact syncs all writes that were ever done to this file in this
> JVM, even with previously opened and now closed descriptors?  VS, eg,
> only sync'ing any new writes done with that particular descriptor?
> In code:
> {code}
> file = new RandomAccess(path, "rw");
> <do many writes to file>
> file.close();
> new RandomAccess(path, "rw").getFD().sync();
> {code}	
> Are we pretty sure that all of the "many writes" will in fact be
> sync'd by that sync call, on all OSs?
> I haven't been able to find convincing evidence one way or another.  I
> did run a timing test comparing overall time if you sync with the same
> descriptor you used for writing vs closing it, opening a new one, and
> syncing with that one, and on Linux at least it seems both approaches
> seem to be syncing because the total elapsed time is roughly the
> same.
> Robert do you know?
> I sure hope the answer is yes ... because if not, the alternative is
> we must sync() before closing the original descriptor, which makes
> things less flexible because eg we cannot cleanly implement
> IndexWriter.commit().
>> Behavior on hard power shutdown
>> -------------------------------
>>                 Key: LUCENE-1044
>>                 URL: 
>> LUCENE-1044
>>             Project: Lucene - Java
>>          Issue Type: Bug
>>          Components: Index
>>         Environment: Windows Server 2003, Standard Edition, Sun  
>> Hotspot Java 1.5
>>            Reporter: venkat rangan
>>            Assignee: Michael McCandless
>>             Fix For: 2.3
>>         Attachments:, LUCENE-1044.patch,  
>> LUCENE-1044.take2.patch, LUCENE-1044.take3.patch,  
>> LUCENE-1044.take4.patch
>> When indexing a large number of documents, upon a hard power  
>> failure  (e.g. pull the power cord), the index seems to get  
>> corrupted. We start a Java application as an Windows Service, and  
>> feed it documents. In some cases (after an index size of 1.7GB,  
>> with 30-40 index segment .cfs files) , the following is observed.
>> The 'segments' file contains only zeros. Its size is 265 bytes -  
>> all bytes are zeros.
>> The 'deleted' file also contains only zeros. Its size is 85 bytes  
>> - all bytes are zeros.
>> Before corruption, the segments file and deleted file appear to be  
>> correct. After this corruption, the index is corrupted and lost.
>> This is a problem observed in Lucene 1.4.3. We are not able to  
>> upgrade our customer deployments to 1.9 or later version, but  
>> would be happy to back-port a patch, if the patch is small enough  
>> and if this problem is already solved.
> -- 
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message