Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 64062 invoked from network); 17 Aug 2009 18:49:06 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 17 Aug 2009 18:49:06 -0000 Received: (qmail 6871 invoked by uid 500); 17 Aug 2009 18:49:10 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 6786 invoked by uid 500); 17 Aug 2009 18:49:10 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 6776 invoked by uid 99); 17 Aug 2009 18:49:10 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 17 Aug 2009 18:49:10 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of serera@gmail.com designates 209.85.219.226 as permitted sender) Received: from [209.85.219.226] (HELO mail-ew0-f226.google.com) (209.85.219.226) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 17 Aug 2009 18:49:02 +0000 Received: by ewy26 with SMTP id 26so3289187ewy.5 for ; Mon, 17 Aug 2009 11:48:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type; bh=aLYXGzL3l0Ol6folT+QTXGWIS0l3YRSo6m+tRIT7kh4=; b=Cdbf3S74zJcav3dmAapr9eYMjD70w17YAep6cn+itr2n08W3CMYStwTcWNAnO/vTQn yST8hPC26TKQ+JFM4DMydkXUPSO/OJ4jxmp28eBWo11gPnv8q2zYnlKCJnGE43y/dmQL PhDyQke3hS9KPj4e1XeYdryNB+j6wranyTs84= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=nhWEs2ABrOsUCH41Cr3DIc/LLqBLN1SPfZmYOKc1dGJbUTrOnw4TYO4OgQUxNQyLQ3 +5NITphgy7GfnV3R9L/uAfTiz0z36mhnpd6xaq2duogZCSd5h5ArfVc73zAkO/DkNn8d 2RChcVSmlwaoAwc+BNo4EhmPeGRDhOI0jnvXI= MIME-Version: 1.0 Received: by 10.216.55.208 with SMTP id k58mr1098506wec.9.1250534919218; Mon, 17 Aug 2009 11:48:39 -0700 (PDT) In-Reply-To: <3ca1516d0908170559h241a3524x529eb2241ac853c9@mail.gmail.com> References: <3ca1516d0908141147i1baf7af2yf15bcf69d4af77dc@mail.gmail.com> <9ac0c6aa0908141202v677039a3g863e83022675220e@mail.gmail.com> <786fde50908141237l590c017ej7ff1e2d68929622c@mail.gmail.com> <3ca1516d0908141355u65b5027rc4892f5c54f16b79@mail.gmail.com> <9ac0c6aa0908141605v72fbc559qb41a9576f0cc64d7@mail.gmail.com> <3ca1516d0908170559h241a3524x529eb2241ac853c9@mail.gmail.com> Date: Mon, 17 Aug 2009 21:48:39 +0300 Message-ID: <786fde50908171148k7c06c952q2c96ccb37236fc87@mail.gmail.com> Subject: Re: Problem doing backup using the SnapshotDeletionPolicy class From: Shai Erera To: java-user@lucene.apache.org Content-Type: multipart/alternative; boundary=00504502e2a0a3a66404715ad9d2 X-Virus-Checked: Checked by ClamAV on apache.org --00504502e2a0a3a66404715ad9d2 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable The way I'd do the backups is by having a background thread that is scheduled to run every X hours/days (depends on how frequent you want to do the backups) and when it wakes it: 1) Creates a SnapshotDeletionPolicy and retrieve the file names. 2) Lists the files in the backup folder. 3) Copies the files from (1) that do not appear in (2). 4) Delete the files in (2) that do not appear in (1) --> that deletes segments that do not exist anymore. But that is because I want to do the backups while the system is up and running, and indexing and search operations occur. >From what you describe below, you index in batches and then indexing stops. So you can just do a directory listing (after everything stops - indexing, optimize ...) and basically repeat steps (2) - (4) from above, I think. Hope this helps, Shai On Mon, Aug 17, 2009 at 3:59 PM, Lucas Naz=E1rio dos Santos < nazario.lucas@gmail.com> wrote: > Thanks Mike. > > I'm using Windows XP with Java 1.6.0 and Lucene 2.4.1. > > I don't know if I'm using the right backup strategy. I have an indexing > process that happens from time to time and the index is getting every day > bigger. Hence, copying all the index every time as a backup strategy is > becoming a painful, endless activity. Giving this scenario, I wouldn't li= ke > to copy the entire index each time I do backup, but only the difference > from > the previous backup. Here is the plan: > > 1. Create an index writer; > 2. Take a snapshot to prevent existing files from changing; > 3. Start indexing until no more documents to be indexed exist; > 4. Retrieve a list of index files that didn't change with > this.snapshotDeletionPolicy.snapshot().getFileNames(); > 5. Copy all other files inside the index folder that not those retrieved = in > step 4 to the backup directory; > 6. Optimize and close the index. > > I really don't know if what I'm doing is the best approach to my problem.= I > believe that people use the SnapshotDeletionPolicy to copy all the index = up > to the last commit. I want to copy only the difference since last backup. > > What I'm thinking about doing now is to index in a new directory at every > indexing iteration, copy this index to the backup folder and merge it wit= h > the main index afterwards. > > What do you guys think I should do? > > Lucas > > > > On Fri, Aug 14, 2009 at 8:05 PM, Michael McCandless < > lucene@mikemccandless.com> wrote: > > > Alas I don't see it failing, with the optimize left in. Which exact > > rev of 2.9 are you testing with? Which OS/filesystem/JRE? > > > > I realize this is just a test so what follows may not apply to your > > "real" usage of SnapshotDeletionPolicy...: > > > > Since you're closing the writer before taking the backup, there's no > > need to even use SnapshotDeletionPolicy (you can just copy all files > > you find in the index). > > > > SnapshotDeletionPolicy's purpose is to enable taking backups while an > > IndexWriter is still open & making ongoing changes to the index, ie a > > "hot backup". > > > > Finally, you're taking the snapshot before doing any indexing... which > > means your backup will only reflect the index as of the last commit > > before you did indexing. > > > > Mike > > > > On Fri, Aug 14, 2009 at 4:55 PM, Lucas Naz=E1rio dos > > Santos wrote: > > > Not as small as I would like, but shows the problem. > > > > > > If you remove the statement > > > > > > // Remove this and the backup works fine > > > optimize(policy); > > > > > > the backup works wonderfully. > > > > > > (More code in the next e-mail) > > > > > > Lucas > > > > > > > > > public static void main(final String[] args) throws > > > CorruptIndexException, IOException, InterruptedException { > > > final SnapshotDeletionPolicy policy =3D new > > > SnapshotDeletionPolicy(new KeepOnlyLastCommitDeletionPolicy()); > > > final IndexBackup backup =3D new IndexBackup(policy, n= ew > > > File("backup")); > > > for (int i =3D 0; i < 3; i++) { > > > index(policy, backup); > > > } > > > } > > > > > > private static void index(final SnapshotDeletionPolicy policy, > > final > > > IndexBackup backup) throws CorruptIndexException, > > > LockObtainFailedException, IOException { > > > IndexWriter writer =3D null; > > > try { > > > FSDirectory.setDisableLocks(true); > > > writer =3D new > > > IndexWriter(FSDirectory.getDirectory("index"), new StandardAnalyzer()= , > > > policy, > > > MaxFieldLength.UNLIMITED); > > > > > > System.out.println("Star: " + > > > backup.willBackupFromNowOn()); > > > > > > for (int i =3D 0; i < 10000; i++) { > > > final Document document =3D new > Document(); > > > document.add(new Field("content", > "content > > > content content content", Store.YES, Index.ANALYZED)); > > > writer.addDocument(document); > > > } > > > } finally { > > > if (writer !=3D null) { > > > writer.close(); > > > } > > > > > > System.out.println("Backup: " + > backup.backup()); > > > > > > // Remove this and the backup works fine > > > optimize(policy); > > > } > > > } > > > > > > private static void optimize(final SnapshotDeletionPolicy > policy) > > > throws CorruptIndexException, LockObtainFailedException, > > > IOException { > > > > > > IndexWriter writer =3D null; > > > try { > > > writer =3D new > > > IndexWriter(FSDirectory.getDirectory("index"), new StandardAnalyzer()= , > > > policy, > > > MaxFieldLength.UNLIMITED); > > > writer.optimize(); > > > } finally { > > > writer.close(); > > > } > > > } > > > > > > > > > On Fri, Aug 14, 2009 at 4:37 PM, Shai Erera wrote: > > > > > >> I think you should also delete files that don't exist anymore in the > > index, > > >> from the backup? > > >> > > >> Shai > > >> > > >> On Fri, Aug 14, 2009 at 10:02 PM, Michael McCandless < > > >> lucene@mikemccandless.com> wrote: > > >> > > >> > Could you boil this down to a small standalone program showing the > > >> problem? > > >> > > > >> > Optimizing in between backups should be completely fine. > > >> > > > >> > Mike > > >> > > > >> > On Fri, Aug 14, 2009 at 2:47 PM, Lucas Naz=E1rio dos > > >> > Santos wrote: > > >> > > Hi, > > >> > > > > >> > > I'm using the SnapshotDeletionPolicy class to backup my index. I > > >> > basically > > >> > > call the snapshot() method from the class SnapshotDeletionPolicy > at > > >> some > > >> > > point, get a list of files that changed, copy then to the backup > > >> folder, > > >> > and > > >> > > finish by calling the release() method. > > >> > > > > >> > > The problem arises when, in between backups, I optimize the inde= x > by > > >> > opening > > >> > > it with the IndexWriter class and calling the optimize() method. > > When I > > >> > > don't optimize in between backups, here is what happens: > > >> > > > > >> > > The first backup copies the segment composed by the files _0.cfs > and > > >> > > segments_2. The second backup copies the files _1.cfs and > > segments_3, > > >> and > > >> > > the third backup copies the files _2.cfs e segments_4. I can ope= n > > the > > >> > backup > > >> > > folder with Luke without problems. > > >> > > > > >> > > When I do optimize in between backups, the copies are as follow: > > >> > > > > >> > > The first backup copies the segment composed by the files _0.cfs > and > > >> > > segments_2. The second backup copies the files _1.cfs and > > segments_3, > > >> and > > >> > > the third backup copies the files _3.cfs e segments_5. In this > case, > > >> when > > >> > I > > >> > > try to open the backup folder, Luke gives a message saying that = it > > >> can't > > >> > > find the file _2.cfs. > > >> > > > > >> > > My question is: how can I backup my index using the > > >> > SnapshotDeletionPolicy > > >> > > and having to optimize the index in between backups? Am I using > the > > >> right > > >> > > backup strategy? > > >> > > > > >> > > Thanks, > > >> > > Lucas > > >> > > > > >> > > > >> > > --------------------------------------------------------------------- > > >> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > > >> > For additional commands, e-mail: java-user-help@lucene.apache.org > > >> > > > >> > > > >> > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > > For additional commands, e-mail: java-user-help@lucene.apache.org > > > > > --00504502e2a0a3a66404715ad9d2--