lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shai Erera <ser...@gmail.com>
Subject Re: Taking backup of a Lucene index
Date Thu, 06 Jun 2013 10:23:20 GMT
Hi

Taking a backup of the index by doing a naive file copy is not a good
approach. As you mentioned, Lucene does background merging and if your
application suddenly commits, old segment files may be deleted. Also, your
backup will most probably include files that were not committed yet.

Rather, you should use SnapshotDeletionPolicy to take a snapshot of the
index, then copy all the files referenced by the snapshot.

You can also try the new Replicator module (will be available in Lucene
4.4) to take periodic backups of the index with very few steps required on
your end.
You can read about it here:
http://shaierera.blogspot.com/2013/05/the-replicator.html

Shai


On Thu, Jun 6, 2013 at 11:14 AM, Daniel Penning <dpenning@gamona.de> wrote:

> I do my backups by creating a new index at the backup target and copying
> everything over with IndexWriter#addIndexes(**IndexReader... readers). In
> the future i am also planing on using a RateLimitedDirectoryWrapper to
> reduce the influence of the running backup on the rest of the system.
>
> Am 06.06.2013 09:43, schrieb Thomas Matthijs:
>
>  On Thu, Jun 6, 2013 at 7:38 AM, Lance Norskog <goksron@gmail.com> wrote:
>>
>>  The simple answer (that somehow nobody gave) is that you can make a copy
>>> of an index directory at any time. Indexes are changed in "generations".
>>> The segment* files describe the current generation of files. All active
>>> indexing goes on in new files. In a commit, all new files are flushed to
>>> disk and then the segment* files change. At any point in this sequence,
>>> all
>>> of the files in the directory form one consistent index.
>>>
>>> This isn't like MySQL or other databases where you have to shut down the
>>> DB to get a safe copy of the files.
>>>
>>
>> If you just do a naive copy, where it gets a file list first, and then
>> copies them, segments can be merged during the copy and deleted by lucene
>> resulting in an incomplete backup, that is why you need the snapshot
>> policy
>> to keep them around until the copy is completed.
>>
>> If you have very few updates and don't mind risking a broken index, or
>> just
>> loop rsync till both sides are equal you don't need anything else indeed
>>
>>
>
> ------------------------------**------------------------------**---------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.**apache.org<java-user-unsubscribe@lucene.apache.org>
> For additional commands, e-mail: java-user-help@lucene.apache.**org<java-user-help@lucene.apache.org>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message