Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 15670 invoked from network); 18 Aug 2009 13:52:01 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 18 Aug 2009 13:52:01 -0000 Received: (qmail 3652 invoked by uid 500); 18 Aug 2009 13:52:17 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 3595 invoked by uid 500); 18 Aug 2009 13:52:17 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 3585 invoked by uid 99); 18 Aug 2009 13:52:17 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 18 Aug 2009 13:52:17 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of nazario.lucas@gmail.com designates 209.85.216.204 as permitted sender) Received: from [209.85.216.204] (HELO mail-px0-f204.google.com) (209.85.216.204) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 18 Aug 2009 13:52:06 +0000 Received: by pxi42 with SMTP id 42so1789997pxi.20 for ; Tue, 18 Aug 2009 06:51:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type; bh=kt7a1IZY/DbEFqiwDW7NWGlWtuq++YVgGOHDhkkn+38=; b=ZMxyPCVdbZGm9sGPWUKhmAtcMlr8D5FtCKM1mQD5qMl0Lo7z6F5dxC7llCrNc8GZbS 2Z7rgD94dOdz7T2WdpzXisc+4V71gedo3xdVyZJ5TlZsU1Z2hA0Cm2837ax0DAPiNWg/ SfqdAqaA7ZwI1yurpxxsINooMpeqk/aMKYXvw= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=aPcf/A8rqnFMIWDJi1pQKCOTwKzgF7NCOvYMFYIy1J56xVlB+DY7Yfyi922szbrnKB SBYps7yqD2OAF3HyvJ/FFAUgBxcm7zYz7FBUi0OV9OJwjEl5dethi94f9jV3awFuM9Aq WUatP+REM+xxtqhuvNHrQALFkgT6iKKzlHkaQ= MIME-Version: 1.0 Received: by 10.114.86.5 with SMTP id j5mr5419854wab.0.1250603505111; Tue, 18 Aug 2009 06:51:45 -0700 (PDT) In-Reply-To: <786fde50908171148k7c06c952q2c96ccb37236fc87@mail.gmail.com> References: <3ca1516d0908141147i1baf7af2yf15bcf69d4af77dc@mail.gmail.com> <9ac0c6aa0908141202v677039a3g863e83022675220e@mail.gmail.com> <786fde50908141237l590c017ej7ff1e2d68929622c@mail.gmail.com> <3ca1516d0908141355u65b5027rc4892f5c54f16b79@mail.gmail.com> <9ac0c6aa0908141605v72fbc559qb41a9576f0cc64d7@mail.gmail.com> <3ca1516d0908170559h241a3524x529eb2241ac853c9@mail.gmail.com> <786fde50908171148k7c06c952q2c96ccb37236fc87@mail.gmail.com> Date: Tue, 18 Aug 2009 10:51:45 -0300 Message-ID: <3ca1516d0908180651s42cdb600k64f6dc8290411c11@mail.gmail.com> Subject: Re: Problem doing backup using the SnapshotDeletionPolicy class From: =?ISO-8859-1?Q?Lucas_Naz=E1rio_dos_Santos?= To: java-user@lucene.apache.org Content-Type: multipart/alternative; boundary=00504502e13bad53d604716ad1af X-Virus-Checked: Checked by ClamAV on apache.org --00504502e13bad53d604716ad1af Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Thanks, Shai. After thinking a bit I don't believe that the SnapshotDeletionPolicy is the best approach to my problem. Because I do batch indexing, the final solution I came up with is to index documents each time in a temporary folder, copy this temporary folder to a backup directory after indexing, and merge it with a complete index to maintain a unique structure. This way I can backup only the difference sinc= e last indexing took place. Lucas On Mon, Aug 17, 2009 at 3:48 PM, Shai Erera wrote: > The way I'd do the backups is by having a background thread that is > scheduled to run every X hours/days (depends on how frequent you want to = do > the backups) and when it wakes it: > 1) Creates a SnapshotDeletionPolicy and retrieve the file names. > 2) Lists the files in the backup folder. > 3) Copies the files from (1) that do not appear in (2). > 4) Delete the files in (2) that do not appear in (1) --> that deletes > segments that do not exist anymore. > > But that is because I want to do the backups while the system is up and > running, and indexing and search operations occur. > > From what you describe below, you index in batches and then indexing stop= s. > So you can just do a directory listing (after everything stops - indexing= , > optimize ...) and basically repeat steps (2) - (4) from above, I think. > > Hope this helps, > Shai > > On Mon, Aug 17, 2009 at 3:59 PM, Lucas Naz=E1rio dos Santos < > nazario.lucas@gmail.com> wrote: > > > Thanks Mike. > > > > I'm using Windows XP with Java 1.6.0 and Lucene 2.4.1. > > > > I don't know if I'm using the right backup strategy. I have an indexing > > process that happens from time to time and the index is getting every d= ay > > bigger. Hence, copying all the index every time as a backup strategy is > > becoming a painful, endless activity. Giving this scenario, I wouldn't > like > > to copy the entire index each time I do backup, but only the difference > > from > > the previous backup. Here is the plan: > > > > 1. Create an index writer; > > 2. Take a snapshot to prevent existing files from changing; > > 3. Start indexing until no more documents to be indexed exist; > > 4. Retrieve a list of index files that didn't change with > > this.snapshotDeletionPolicy.snapshot().getFileNames(); > > 5. Copy all other files inside the index folder that not those retrieve= d > in > > step 4 to the backup directory; > > 6. Optimize and close the index. > > > > I really don't know if what I'm doing is the best approach to my proble= m. > I > > believe that people use the SnapshotDeletionPolicy to copy all the inde= x > up > > to the last commit. I want to copy only the difference since last backu= p. > > > > What I'm thinking about doing now is to index in a new directory at eve= ry > > indexing iteration, copy this index to the backup folder and merge it > with > > the main index afterwards. > > > > What do you guys think I should do? > > > > Lucas > > > > > > > > On Fri, Aug 14, 2009 at 8:05 PM, Michael McCandless < > > lucene@mikemccandless.com> wrote: > > > > > Alas I don't see it failing, with the optimize left in. Which exact > > > rev of 2.9 are you testing with? Which OS/filesystem/JRE? > > > > > > I realize this is just a test so what follows may not apply to your > > > "real" usage of SnapshotDeletionPolicy...: > > > > > > Since you're closing the writer before taking the backup, there's no > > > need to even use SnapshotDeletionPolicy (you can just copy all files > > > you find in the index). > > > > > > SnapshotDeletionPolicy's purpose is to enable taking backups while an > > > IndexWriter is still open & making ongoing changes to the index, ie a > > > "hot backup". > > > > > > Finally, you're taking the snapshot before doing any indexing... whic= h > > > means your backup will only reflect the index as of the last commit > > > before you did indexing. > > > > > > Mike > > > > > > On Fri, Aug 14, 2009 at 4:55 PM, Lucas Naz=E1rio dos > > > Santos wrote: > > > > Not as small as I would like, but shows the problem. > > > > > > > > If you remove the statement > > > > > > > > // Remove this and the backup works fine > > > > optimize(policy); > > > > > > > > the backup works wonderfully. > > > > > > > > (More code in the next e-mail) > > > > > > > > Lucas > > > > > > > > > > > > public static void main(final String[] args) throws > > > > CorruptIndexException, IOException, InterruptedException { > > > > final SnapshotDeletionPolicy policy =3D new > > > > SnapshotDeletionPolicy(new KeepOnlyLastCommitDeletionPolicy()); > > > > final IndexBackup backup =3D new IndexBackup(policy,= new > > > > File("backup")); > > > > for (int i =3D 0; i < 3; i++) { > > > > index(policy, backup); > > > > } > > > > } > > > > > > > > private static void index(final SnapshotDeletionPolicy polic= y, > > > final > > > > IndexBackup backup) throws CorruptIndexException, > > > > LockObtainFailedException, IOException { > > > > IndexWriter writer =3D null; > > > > try { > > > > FSDirectory.setDisableLocks(true); > > > > writer =3D new > > > > IndexWriter(FSDirectory.getDirectory("index"), new > StandardAnalyzer(), > > > > policy, > > > > MaxFieldLength.UNLIMITED); > > > > > > > > System.out.println("Star: " + > > > > backup.willBackupFromNowOn()); > > > > > > > > for (int i =3D 0; i < 10000; i++) { > > > > final Document document =3D new > > Document(); > > > > document.add(new Field("content", > > "content > > > > content content content", Store.YES, Index.ANALYZED)); > > > > writer.addDocument(document); > > > > } > > > > } finally { > > > > if (writer !=3D null) { > > > > writer.close(); > > > > } > > > > > > > > System.out.println("Backup: " + > > backup.backup()); > > > > > > > > // Remove this and the backup works fine > > > > optimize(policy); > > > > } > > > > } > > > > > > > > private static void optimize(final SnapshotDeletionPolicy > > policy) > > > > throws CorruptIndexException, LockObtainFailedException, > > > > IOException { > > > > > > > > IndexWriter writer =3D null; > > > > try { > > > > writer =3D new > > > > IndexWriter(FSDirectory.getDirectory("index"), new > StandardAnalyzer(), > > > > policy, > > > > MaxFieldLength.UNLIMITED); > > > > writer.optimize(); > > > > } finally { > > > > writer.close(); > > > > } > > > > } > > > > > > > > > > > > On Fri, Aug 14, 2009 at 4:37 PM, Shai Erera > wrote: > > > > > > > >> I think you should also delete files that don't exist anymore in t= he > > > index, > > > >> from the backup? > > > >> > > > >> Shai > > > >> > > > >> On Fri, Aug 14, 2009 at 10:02 PM, Michael McCandless < > > > >> lucene@mikemccandless.com> wrote: > > > >> > > > >> > Could you boil this down to a small standalone program showing t= he > > > >> problem? > > > >> > > > > >> > Optimizing in between backups should be completely fine. > > > >> > > > > >> > Mike > > > >> > > > > >> > On Fri, Aug 14, 2009 at 2:47 PM, Lucas Naz=E1rio dos > > > >> > Santos wrote: > > > >> > > Hi, > > > >> > > > > > >> > > I'm using the SnapshotDeletionPolicy class to backup my index.= I > > > >> > basically > > > >> > > call the snapshot() method from the class SnapshotDeletionPoli= cy > > at > > > >> some > > > >> > > point, get a list of files that changed, copy then to the back= up > > > >> folder, > > > >> > and > > > >> > > finish by calling the release() method. > > > >> > > > > > >> > > The problem arises when, in between backups, I optimize the > index > > by > > > >> > opening > > > >> > > it with the IndexWriter class and calling the optimize() metho= d. > > > When I > > > >> > > don't optimize in between backups, here is what happens: > > > >> > > > > > >> > > The first backup copies the segment composed by the files _0.c= fs > > and > > > >> > > segments_2. The second backup copies the files _1.cfs and > > > segments_3, > > > >> and > > > >> > > the third backup copies the files _2.cfs e segments_4. I can > open > > > the > > > >> > backup > > > >> > > folder with Luke without problems. > > > >> > > > > > >> > > When I do optimize in between backups, the copies are as follo= w: > > > >> > > > > > >> > > The first backup copies the segment composed by the files _0.c= fs > > and > > > >> > > segments_2. The second backup copies the files _1.cfs and > > > segments_3, > > > >> and > > > >> > > the third backup copies the files _3.cfs e segments_5. In this > > case, > > > >> when > > > >> > I > > > >> > > try to open the backup folder, Luke gives a message saying tha= t > it > > > >> can't > > > >> > > find the file _2.cfs. > > > >> > > > > > >> > > My question is: how can I backup my index using the > > > >> > SnapshotDeletionPolicy > > > >> > > and having to optimize the index in between backups? Am I usin= g > > the > > > >> right > > > >> > > backup strategy? > > > >> > > > > > >> > > Thanks, > > > >> > > Lucas > > > >> > > > > > >> > > > > >> > > > --------------------------------------------------------------------- > > > >> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > > > >> > For additional commands, e-mail: java-user-help@lucene.apache.or= g > > > >> > > > > >> > > > > >> > > > > > > > > > > --------------------------------------------------------------------- > > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > > > For additional commands, e-mail: java-user-help@lucene.apache.org > > > > > > > > > --00504502e13bad53d604716ad1af--