Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: java-user@lucene.apache.org
Received-SPF: pass (athena.apache.org: domain of serera@gmail.com designates
 209.85.219.226 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=mime-version:in-reply-to:references:date:message-id:subject:from:to
         :content-type;
        b=nhWEs2ABrOsUCH41Cr3DIc/LLqBLN1SPfZmYOKc1dGJbUTrOnw4TYO4OgQUxNQyLQ3
         +5NITphgy7GfnV3R9L/uAfTiz0z36mhnpd6xaq2duogZCSd5h5ArfVc73zAkO/DkNn8d
         2RChcVSmlwaoAwc+BNo4EhmPeGRDhOI0jnvXI=
MIME-Version: 1.0
In-Reply-To: <3ca1516d0908170559h241a3524x529eb2241ac853c9@mail.gmail.com>
References: <3ca1516d0908141147i1baf7af2yf15bcf69d4af77dc@mail.gmail.com>
	 <9ac0c6aa0908141202v677039a3g863e83022675220e@mail.gmail.com>
	 <786fde50908141237l590c017ej7ff1e2d68929622c@mail.gmail.com>
	 <3ca1516d0908141355u65b5027rc4892f5c54f16b79@mail.gmail.com>
	 <9ac0c6aa0908141605v72fbc559qb41a9576f0cc64d7@mail.gmail.com>
	 <3ca1516d0908170559h241a3524x529eb2241ac853c9@mail.gmail.com>
Date: Mon, 17 Aug 2009 21:48:39 +0300
Message-ID: <786fde50908171148k7c06c952q2c96ccb37236fc87@mail.gmail.com>
Subject: Re: Problem doing backup using the SnapshotDeletionPolicy class
From: Shai Erera <serera@gmail.com>
To: java-user@lucene.apache.org
Content-Type: multipart/alternative; boundary=00504502e2a0a3a66404715ad9d2

--00504502e2a0a3a66404715ad9d2
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

The way I'd do the backups is by having a background thread that is
scheduled to run every X hours/days (depends on how frequent you want to do
the backups) and when it wakes it:
1) Creates a SnapshotDeletionPolicy and retrieve the file names.
2) Lists the files in the backup folder.
3) Copies the files from (1) that do not appear in (2).
4) Delete the files in (2) that do not appear in (1) --> that deletes
segments that do not exist anymore.

But that is because I want to do the backups while the system is up and
running, and indexing and search operations occur.

>From what you describe below, you index in batches and then indexing stops.
So you can just do a directory listing (after everything stops - indexing,
optimize ...) and basically repeat steps (2) - (4) from above, I think.

Hope this helps,
Shai

On Mon, Aug 17, 2009 at 3:59 PM, Lucas Naz=E1rio dos Santos <
nazario.lucas@gmail.com> wrote:

> Thanks Mike.
>
> I'm using Windows XP with Java 1.6.0 and Lucene 2.4.1.
>
> I don't know if I'm using the right backup strategy. I have an indexing
> process that happens from time to time and the index is getting every day
> bigger. Hence, copying all the index every time as a backup strategy is
> becoming a painful, endless activity. Giving this scenario, I wouldn't li=
ke
> to copy the entire index each time I do backup, but only the difference
> from
> the previous backup. Here is the plan:
>
> 1. Create an index writer;
> 2. Take a snapshot to prevent existing files from changing;
> 3. Start indexing until no more documents to be indexed exist;
> 4. Retrieve a list of index files that didn't change with
> this.snapshotDeletionPolicy.snapshot().getFileNames();
> 5. Copy all other files inside the index folder that not those retrieved =
in
> step 4 to the backup directory;
> 6. Optimize and close the index.
>
> I really don't know if what I'm doing is the best approach to my problem.=
 I
> believe that people use the SnapshotDeletionPolicy to copy all the index =
up
> to the last commit. I want to copy only the difference since last backup.
>
> What I'm thinking about doing now is to index in a new directory at every
> indexing iteration, copy this index to the backup folder and merge it wit=
h
> the main index afterwards.
>
> What do you guys think I should do?
>
> Lucas
>
>
>
> On Fri, Aug 14, 2009 at 8:05 PM, Michael McCandless <
> lucene@mikemccandless.com> wrote:
>
> > Alas I don't see it failing, with the optimize left in.  Which exact
> > rev of 2.9 are you testing with?  Which OS/filesystem/JRE?
> >
> > I realize this is just a test so what follows may not apply to your
> > "real" usage of SnapshotDeletionPolicy...:
> >
> > Since you're closing the writer before taking the backup, there's no
> > need to even use SnapshotDeletionPolicy (you can just copy all files
> > you find in the index).
> >
> > SnapshotDeletionPolicy's purpose is to enable taking backups while an
> > IndexWriter is still open & making ongoing changes to the index, ie a
> > "hot backup".
> >
> > Finally, you're taking the snapshot before doing any indexing... which
> > means your backup will only reflect the index as of the last commit
> > before you did indexing.
> >
> > Mike
> >
> > On Fri, Aug 14, 2009 at 4:55 PM, Lucas Naz=E1rio dos
> > Santos<nazario.lucas@gmail.com> wrote:
> > > Not as small as I would like, but shows the problem.
> > >
> > > If you remove the statement
> > >
> > > // Remove this and the backup works fine
> > > optimize(policy);
> > >
> > > the backup works wonderfully.
> > >
> > > (More code in the next e-mail)
> > >
> > > Lucas
> > >
> > >
> > >        public static void main(final String[] args) throws
> > > CorruptIndexException, IOException, InterruptedException {
> > >                final SnapshotDeletionPolicy policy =3D new
> > > SnapshotDeletionPolicy(new KeepOnlyLastCommitDeletionPolicy());
> > >                final IndexBackup backup =3D new IndexBackup(policy, n=
ew
> > > File("backup"));
> > >                for (int i =3D 0; i < 3; i++) {
> > >                        index(policy, backup);
> > >                }
> > >        }
> > >
> > >        private static void index(final SnapshotDeletionPolicy policy,
> > final
> > > IndexBackup backup) throws CorruptIndexException,
> > >                        LockObtainFailedException, IOException {
> > >                IndexWriter writer =3D null;
> > >                try {
> > >                        FSDirectory.setDisableLocks(true);
> > >                        writer =3D new
> > > IndexWriter(FSDirectory.getDirectory("index"), new StandardAnalyzer()=
,
> > > policy,
> > >                                        MaxFieldLength.UNLIMITED);
> > >
> > >                        System.out.println("Star: " +
> > > backup.willBackupFromNowOn());
> > >
> > >                        for (int i =3D 0; i < 10000; i++) {
> > >                                final Document document =3D new
> Document();
> > >                                document.add(new Field("content",
> "content
> > > content content content", Store.YES, Index.ANALYZED));
> > >                                writer.addDocument(document);
> > >                        }
> > >                } finally {
> > >                        if (writer !=3D null) {
> > >                                writer.close();
> > >                        }
> > >
> > >                        System.out.println("Backup: " +
> backup.backup());
> > >
> > >                        // Remove this and the backup works fine
> > >                        optimize(policy);
> > >                }
> > >        }
> > >
> > >        private static void optimize(final SnapshotDeletionPolicy
> policy)
> > > throws CorruptIndexException, LockObtainFailedException,
> > >                        IOException {
> > >
> > >                IndexWriter writer =3D null;
> > >                try {
> > >                        writer =3D new
> > > IndexWriter(FSDirectory.getDirectory("index"), new StandardAnalyzer()=
,
> > > policy,
> > >                                        MaxFieldLength.UNLIMITED);
> > >                        writer.optimize();
> > >                } finally {
> > >                        writer.close();
> > >                }
> > >        }
> > >
> > >
> > > On Fri, Aug 14, 2009 at 4:37 PM, Shai Erera <serera@gmail.com> wrote:
> > >
> > >> I think you should also delete files that don't exist anymore in the
> > index,
> > >> from the backup?
> > >>
> > >> Shai
> > >>
> > >> On Fri, Aug 14, 2009 at 10:02 PM, Michael McCandless <
> > >> lucene@mikemccandless.com> wrote:
> > >>
> > >> > Could you boil this down to a small standalone program showing the
> > >> problem?
> > >> >
> > >> > Optimizing in between backups should be completely fine.
> > >> >
> > >> > Mike
> > >> >
> > >> > On Fri, Aug 14, 2009 at 2:47 PM, Lucas Naz=E1rio dos
> > >> > Santos<nazario.lucas@gmail.com> wrote:
> > >> > > Hi,
> > >> > >
> > >> > > I'm using the SnapshotDeletionPolicy class to backup my index. I
> > >> > basically
> > >> > > call the snapshot() method from the class SnapshotDeletionPolicy
> at
> > >> some
> > >> > > point, get a list of files that changed, copy then to the backup
> > >> folder,
> > >> > and
> > >> > > finish by calling the release() method.
> > >> > >
> > >> > > The problem arises when, in between backups, I optimize the inde=
x
> by
> > >> > opening
> > >> > > it with the IndexWriter class and calling the optimize() method.
> > When I
> > >> > > don't optimize in between backups, here is what happens:
> > >> > >
> > >> > > The first backup copies the segment composed by the files _0.cfs
> and
> > >> > > segments_2. The second backup copies the files _1.cfs and
> > segments_3,
> > >> and
> > >> > > the third backup copies the files _2.cfs e segments_4. I can ope=
n
> > the
> > >> > backup
> > >> > > folder with Luke without problems.
> > >> > >
> > >> > > When I do optimize in between backups, the copies are as follow:
> > >> > >
> > >> > > The first backup copies the segment composed by the files _0.cfs
> and
> > >> > > segments_2. The second backup copies the files _1.cfs and
> > segments_3,
> > >> and
> > >> > > the third backup copies the files _3.cfs e segments_5. In this
> case,
> > >> when
> > >> > I
> > >> > > try to open the backup folder, Luke gives a message saying that =
it
> > >> can't
> > >> > > find the file _2.cfs.
> > >> > >
> > >> > > My question is: how can I backup my index using the
> > >> > SnapshotDeletionPolicy
> > >> > > and having to optimize the index in between backups? Am I using
> the
> > >> right
> > >> > > backup strategy?
> > >> > >
> > >> > > Thanks,
> > >> > > Lucas
> > >> > >
> > >> >
> > >> >
> ---------------------------------------------------------------------
> > >> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > >> > For additional commands, e-mail: java-user-help@lucene.apache.org
> > >> >
> > >> >
> > >>
> > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
>

--00504502e2a0a3a66404715ad9d2--