lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lucas Nazário dos Santos <nazario.lu...@gmail.com>
Subject Re: Problem doing backup using the SnapshotDeletionPolicy class
Date Mon, 17 Aug 2009 12:59:35 GMT
Thanks Mike.

I'm using Windows XP with Java 1.6.0 and Lucene 2.4.1.

I don't know if I'm using the right backup strategy. I have an indexing
process that happens from time to time and the index is getting every day
bigger. Hence, copying all the index every time as a backup strategy is
becoming a painful, endless activity. Giving this scenario, I wouldn't like
to copy the entire index each time I do backup, but only the difference from
the previous backup. Here is the plan:

1. Create an index writer;
2. Take a snapshot to prevent existing files from changing;
3. Start indexing until no more documents to be indexed exist;
4. Retrieve a list of index files that didn't change with
this.snapshotDeletionPolicy.snapshot().getFileNames();
5. Copy all other files inside the index folder that not those retrieved in
step 4 to the backup directory;
6. Optimize and close the index.

I really don't know if what I'm doing is the best approach to my problem. I
believe that people use the SnapshotDeletionPolicy to copy all the index up
to the last commit. I want to copy only the difference since last backup.

What I'm thinking about doing now is to index in a new directory at every
indexing iteration, copy this index to the backup folder and merge it with
the main index afterwards.

What do you guys think I should do?

Lucas



On Fri, Aug 14, 2009 at 8:05 PM, Michael McCandless <
lucene@mikemccandless.com> wrote:

> Alas I don't see it failing, with the optimize left in.  Which exact
> rev of 2.9 are you testing with?  Which OS/filesystem/JRE?
>
> I realize this is just a test so what follows may not apply to your
> "real" usage of SnapshotDeletionPolicy...:
>
> Since you're closing the writer before taking the backup, there's no
> need to even use SnapshotDeletionPolicy (you can just copy all files
> you find in the index).
>
> SnapshotDeletionPolicy's purpose is to enable taking backups while an
> IndexWriter is still open & making ongoing changes to the index, ie a
> "hot backup".
>
> Finally, you're taking the snapshot before doing any indexing... which
> means your backup will only reflect the index as of the last commit
> before you did indexing.
>
> Mike
>
> On Fri, Aug 14, 2009 at 4:55 PM, Lucas Nazário dos
> Santos<nazario.lucas@gmail.com> wrote:
> > Not as small as I would like, but shows the problem.
> >
> > If you remove the statement
> >
> > // Remove this and the backup works fine
> > optimize(policy);
> >
> > the backup works wonderfully.
> >
> > (More code in the next e-mail)
> >
> > Lucas
> >
> >
> >        public static void main(final String[] args) throws
> > CorruptIndexException, IOException, InterruptedException {
> >                final SnapshotDeletionPolicy policy = new
> > SnapshotDeletionPolicy(new KeepOnlyLastCommitDeletionPolicy());
> >                final IndexBackup backup = new IndexBackup(policy, new
> > File("backup"));
> >                for (int i = 0; i < 3; i++) {
> >                        index(policy, backup);
> >                }
> >        }
> >
> >        private static void index(final SnapshotDeletionPolicy policy,
> final
> > IndexBackup backup) throws CorruptIndexException,
> >                        LockObtainFailedException, IOException {
> >                IndexWriter writer = null;
> >                try {
> >                        FSDirectory.setDisableLocks(true);
> >                        writer = new
> > IndexWriter(FSDirectory.getDirectory("index"), new StandardAnalyzer(),
> > policy,
> >                                        MaxFieldLength.UNLIMITED);
> >
> >                        System.out.println("Star: " +
> > backup.willBackupFromNowOn());
> >
> >                        for (int i = 0; i < 10000; i++) {
> >                                final Document document = new Document();
> >                                document.add(new Field("content", "content
> > content content content", Store.YES, Index.ANALYZED));
> >                                writer.addDocument(document);
> >                        }
> >                } finally {
> >                        if (writer != null) {
> >                                writer.close();
> >                        }
> >
> >                        System.out.println("Backup: " + backup.backup());
> >
> >                        // Remove this and the backup works fine
> >                        optimize(policy);
> >                }
> >        }
> >
> >        private static void optimize(final SnapshotDeletionPolicy policy)
> > throws CorruptIndexException, LockObtainFailedException,
> >                        IOException {
> >
> >                IndexWriter writer = null;
> >                try {
> >                        writer = new
> > IndexWriter(FSDirectory.getDirectory("index"), new StandardAnalyzer(),
> > policy,
> >                                        MaxFieldLength.UNLIMITED);
> >                        writer.optimize();
> >                } finally {
> >                        writer.close();
> >                }
> >        }
> >
> >
> > On Fri, Aug 14, 2009 at 4:37 PM, Shai Erera <serera@gmail.com> wrote:
> >
> >> I think you should also delete files that don't exist anymore in the
> index,
> >> from the backup?
> >>
> >> Shai
> >>
> >> On Fri, Aug 14, 2009 at 10:02 PM, Michael McCandless <
> >> lucene@mikemccandless.com> wrote:
> >>
> >> > Could you boil this down to a small standalone program showing the
> >> problem?
> >> >
> >> > Optimizing in between backups should be completely fine.
> >> >
> >> > Mike
> >> >
> >> > On Fri, Aug 14, 2009 at 2:47 PM, Lucas Nazário dos
> >> > Santos<nazario.lucas@gmail.com> wrote:
> >> > > Hi,
> >> > >
> >> > > I'm using the SnapshotDeletionPolicy class to backup my index. I
> >> > basically
> >> > > call the snapshot() method from the class SnapshotDeletionPolicy at
> >> some
> >> > > point, get a list of files that changed, copy then to the backup
> >> folder,
> >> > and
> >> > > finish by calling the release() method.
> >> > >
> >> > > The problem arises when, in between backups, I optimize the index
by
> >> > opening
> >> > > it with the IndexWriter class and calling the optimize() method.
> When I
> >> > > don't optimize in between backups, here is what happens:
> >> > >
> >> > > The first backup copies the segment composed by the files _0.cfs and
> >> > > segments_2. The second backup copies the files _1.cfs and
> segments_3,
> >> and
> >> > > the third backup copies the files _2.cfs e segments_4. I can open
> the
> >> > backup
> >> > > folder with Luke without problems.
> >> > >
> >> > > When I do optimize in between backups, the copies are as follow:
> >> > >
> >> > > The first backup copies the segment composed by the files _0.cfs and
> >> > > segments_2. The second backup copies the files _1.cfs and
> segments_3,
> >> and
> >> > > the third backup copies the files _3.cfs e segments_5. In this case,
> >> when
> >> > I
> >> > > try to open the backup folder, Luke gives a message saying that it
> >> can't
> >> > > find the file _2.cfs.
> >> > >
> >> > > My question is: how can I backup my index using the
> >> > SnapshotDeletionPolicy
> >> > > and having to optimize the index in between backups? Am I using the
> >> right
> >> > > backup strategy?
> >> > >
> >> > > Thanks,
> >> > > Lucas
> >> > >
> >> >
> >> > ---------------------------------------------------------------------
> >> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >> >
> >> >
> >>
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message