lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Busch <busch...@gmail.com>
Subject IndexWriter shutdown
Date Thu, 17 May 2007 13:17:30 GMT
Hi,

if you run Lucene as a service you want to be able to shut it down in a 
certain period of time (usually 1-2 mins). This can be a problem if the 
IndexWriter is in the middle of a merge when the service shutdown 
request is received.

Therefore it would be nice if we had a method in IndexWriter called e. 
g. shutdown() which satisfies the following two requirements:
- if a merge is happening, abort it
- flush the buffered docs but do not trigger a merge

The latter is easy: we just need a flush method that does not trigger a 
merge. That's a two line change in IndexWriter.

The former is more complex. The first way of implementing this that came 
to my mind was to add checks to the different merge loops, like "only 
continue if shutdown hasn't been called yet". The obvious drawback of 
this approach is a performance impact and the need to make code changes 
in different places: merging fields, merging postings, merging 
termvectors, writing compound files. So I think this is a quite ugly 
approach.

The approach I implemented is sort of a hack, but I'd like to describe 
it briefly here. I extended the FSDirectory and FSIndexOutput:

    public static class ExtendedFSDirectory extends FSDirectory {
        private boolean interrupted = false;
        
        public void interrupt() {
            this.interrupted = true;
        }
        
        public void clearInterrupt() {
            this.interrupted = false;
        }
        
        public IndexOutput createOutput(String name) throws IOException {
            File file = new File(getFile(), name);
            if (file.exists() && !file.delete())          // delete 
existing, if any
              throw new IOException("Cannot overwrite: " + file);

            return new FSIndexOutput(file) {
                public void flushBuffer(byte[] b, int offset, int size) 
throws IOException {
                    if (ExtendedFSDirectory.this.interrupted) {
                        throw new IndexWriterInterruptException();
                    }
                    
                    super.flushBuffer(b, offset, size);
                }

            };
        }
    }
    
    // This exception is used to signal an interrupt request    
    static final class IndexWriterInterruptException extends IOException {
        private static final long serialVersionUID = 1L;
    }

So now FSIndexOutput.flushBuffer() throws an 
IndexWriterInterruptException in case interrupt() has been called. This 
causes the IndexWriter to abort the merge and to rollback the transaction.

I have another class that extends IndexWriter and overwrites the 
addDocument() and updateDocument() methods. In these methods I catch the 
IndexWriterInterruptException. In case it is thrown 
IndexWriter.flushRamSegments(boolean triggerMerge) is called with 
triggerMerge=false.
An advantage of this implementation is that almost all changes can be 
made on top of Lucene. The only core change is the protected method 
flushRamSegments(boolean triggerMerge) in IndexWriter.

My question is if people think that the shutdown feature is something we 
would like to add to the Lucene core? If yes, I can go ahead and attach 
my code to a JIRA issue, if no I'd like to make the small change to 
IndexWriter (add the protected method flushRamSegments(triggerMerge)). 
My approach seems to work quite well, but maybe others (e. g. the 
IndexWriter "experts") have different/better ideas how to implement it.

Thanks,
Michael

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message