lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: replication question
Date Tue, 16 Dec 2008 21:22:17 GMT

It's better to use SnapshotDeletionPolicy to grab a consistent image  
of the index.  You don't need to close the IndexWriter, nor stop  
making changes through IndexWriter, and it lets you capture a given  
segments_N (and all index files it needs) and then take your time  
making a copy/backup/etc of all files in the snapshot.

There's a "green paper", excerpted from the upcoming Lucene in Action  
revision, that covers how to use SnapshotDeletionPolicy for backing up  
an index:

   http://manning.com/free/green_HotBackupsLucene.html

(Disclaimers: 1) I wrote the article, 2) The link is frustrating  
because you have to submit your email address, then get email w/ a  
link that gives you a zip file, which you then unzip and open the  
index.html... I've been meaning to post the article directly to the  
Wiki so now seems like a good time!).

Mike

Michael Stoppelman wrote:

> Hi Yonik,
>
> Thanks for the response.
>
> reply inline.
>
> On Tue, Dec 16, 2008 at 6:44 AM, Yonik Seeley <yseeley@gmail.com>  
> wrote:
>
>> On Tue, Dec 16, 2008 at 1:04 AM, Michael Stoppelman <stopman@gmail.com 
>> >
>> wrote:
>>> I've got a question from Doug's original email about replication (
>>> http://www.mail-archive.com/lucene-user@jakarta.apache.org/msg12709.html
>> ):
>>>
>>> "1. On the index master, periodically checkpoint the index. Every  
>>> minute
>> or
>>> so the IndexWriter is closed and a 'cp -lr index index.DATE'  
>>> command is
>>> executed from Java, where DATE is the current date and time. This
>>> efficiently makes a copy of the index when its in a consistent  
>>> state by
>>> constructing a tree of hard links. If Lucene re-writes any files  
>>> (e.g.,
>> the
>>> segments file) a new inode is created and the copy is unchanged."
>>>
>>> Is closing the IndexWriter really a requirement on taking a  
>>> snapshot? Or
>> can
>>> one take a snapshot on an index being written, I've done this in my
>>> development environment and it seems to work fine w/o closing the
>>> IndexWriter.
>>
>> There are subtle race conditions if you try to do this with a  
>> changing
>> index.
>> At any instance in time, the index should be consistent, *but* you
>> can't actually make a snapshot instantaneously.
>
>
> Is the race condition in writing out the segments.gen or segments_N  
> files?
> From my understanding index segments once closed by the IndexWriter  
> they
> aren't modified again (they might be deleted though if they're  
> merged away).
>
>
>>
>> So this is doable, but it would require some complex retry logic like
>> IndexReader has when opening an index.
>>
>>> Also the solr replication shell scripts don't seem to worry
>>> about this either.
>>
>> Solr takes snapshots when it knows it's not updating the index (new
>> index changes are internally blocked when calling snapshooter).
>>
>> -Yonik
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message