Return-Path: X-Original-To: apmail-zookeeper-user-archive@www.apache.org Delivered-To: apmail-zookeeper-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id DF95A10AA1 for ; Mon, 8 Jul 2013 21:30:47 +0000 (UTC) Received: (qmail 48363 invoked by uid 500); 8 Jul 2013 21:30:47 -0000 Delivered-To: apmail-zookeeper-user-archive@zookeeper.apache.org Received: (qmail 48325 invoked by uid 500); 8 Jul 2013 21:30:47 -0000 Mailing-List: contact user-help@zookeeper.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@zookeeper.apache.org Delivered-To: mailing list user@zookeeper.apache.org Received: (qmail 48316 invoked by uid 99); 8 Jul 2013 21:30:47 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 08 Jul 2013 21:30:47 +0000 X-ASF-Spam-Status: No, hits=1.0 required=5.0 tests=FORGED_YAHOO_RCVD,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [77.238.189.195] (HELO nm13-vm0.bullet.mail.ird.yahoo.com) (77.238.189.195) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 08 Jul 2013 21:30:41 +0000 Received: from [77.238.189.230] by nm13.bullet.mail.ird.yahoo.com with NNFMP; 08 Jul 2013 21:30:19 -0000 Received: from [46.228.39.109] by tm11.bullet.mail.ird.yahoo.com with NNFMP; 08 Jul 2013 21:30:19 -0000 Received: from [127.0.0.1] by smtp146.mail.ir2.yahoo.com with NNFMP; 08 Jul 2013 21:30:19 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1373319019; bh=sdQC2RoU6ErCExPb9iko/+teHXRga6wZlRXR03TpKlI=; h=X-Yahoo-Newman-Id:X-Yahoo-Newman-Property:X-YMail-OSG:X-Yahoo-SMTP:X-Rocket-Received:Content-Type:Mime-Version:Subject:From:In-Reply-To:Date:Content-Transfer-Encoding:Message-Id:References:To:X-Mailer; b=4+czM+4sfpooHt9rj195ux1S+e/gbxWZYm/imtRI+DpVzBKr4Z62bntl88n9Rur2V12nzREWvcwagiaaA/ABxI2lUIlABw9K0d42c38hK4kHVfb34bnEIQxolzZsxFJOxmvqbmAFzNyROCPuhz0kTkvGvf4PNLHHvkrxEp9ukbo= X-Yahoo-Newman-Id: 263613.71012.bm@smtp146.mail.ir2.yahoo.com X-Yahoo-Newman-Property: ymail-3 X-YMail-OSG: s_ABkbQVM1le90DVKYJUoaQa06oYDHe7x0V9vurlKBzt1AG S2MIJiIh8pye.MY7xRQuleRF53.lkM7KZ396byASkZ06iudd_l93rVt0N0XQ ITsZYiORx1aHOvdOjs0P3yOL65.yjfDqi2b6K88UBoWI6j8sizruG3Y.yCs6 rGNwZwxkrpiJfyq1aFhfR_sv0qcMUHQnxjZueKp0j0odDxPmF_nJXyhuLRCf m1pWtlkDCHnl0emz6MbJcYVoN6wJgVS8a0GpJH2_a9RSq75D2SVPaSOKvbVI j6xPdYRYXS9nCKb2iCdIUmUlNHHgFAzLzwa.UROK026CM_KNfjArzpnH1BBS hpVBvpp.lSS1KQVBVHFFZ7eue1jkk1Hi6FGA7Tko0jia9Ad9IZh6Dbei3dIg Hw8HK9VjUywnbmvaC9xYPV7QjN2RWknJ94OU2Kkek4DOWHXR4uXQjyrYMz4v o8G857UZ_dtli91dC37LQf82w_9fdbupSxAfv6FlP9_sLd9Xk6we0LyhtIEm VyQVhrF6EiFks1t3aXCLOoDdRR2jjVDUB7rMOiUeLC8Hy4NXfjoXRHftwyii Jzw8hOo4XwJFnb8eOpvwaRP4TouKi.g-- X-Yahoo-SMTP: HT5UJDeswBACWJPOeualxAa.da..S.fl X-Rocket-Received: from [192.168.1.34] (fpjunqueira@88.6.168.112 with ) by smtp146.mail.ir2.yahoo.com with SMTP; 08 Jul 2013 21:30:19 +0000 UTC Content-Type: text/plain; charset=iso-8859-1 Mime-Version: 1.0 (Mac OS X Mail 6.3 \(1503\)) Subject: Re: Efficient backup and a reasonable restore of an ensemble From: Flavio Junqueira In-Reply-To: Date: Mon, 8 Jul 2013 23:30:18 +0200 Content-Transfer-Encoding: quoted-printable Message-Id: <09E54E74-5A21-4133-B0FF-94A33B66FE45@yahoo.com> References: To: user@zookeeper.apache.org X-Mailer: Apple Mail (2.1503) X-Virus-Checked: Checked by ClamAV on apache.org One bit that is still a bit confusing to me in your use case is if you = need to take a snapshot right after some event in your application. Even = if you're able to tell ZooKeeper to take a snapshot, there is no = guarantee that it will happen at the exact point you want it if update = operations keep coming. If you use your four-letter word approach, then would you search for the = leader or would you simply take a snapshot at any server? If it has to = go through the leader so that you make sure to have the most recent = committed state, then it might not be a bad idea to have an api call = that tells the leader to take a snapshot at some directory of your = choice. Informing you the name of the snapshot file so that you can copy = sounds like an option, but perhaps it is not as convenient. The approach of adding another server is not very clear. How do you = force it to be the leader? Keep in mind that if it crashes, then it will = lose leadership. -Flavio=20 On Jul 8, 2013, at 8:34 AM, Sergey Maslyakov wrote: > It looks like the "dev" mailing list is rather inactive. Over the past = few > days I only saw several automated emails from JIRA and this is pretty = much > it. Contrary to this, the "user" mailing list seems to be more alive = and > more populated. >=20 > With this in mind, please allow me to cross-post here the message I = sent > into the "dev" list a few days ago. >=20 >=20 > Regards, > /Sergey >=20 > =3D=3D=3D forwarded message begins here =3D=3D=3D >=20 > Hi! >=20 > I'm facing the problem that has been raised by multiple people but = none of > the discussion threads seem to provide a good answer. I dug in = Zookeeper > source code trying to come up with some possible approaches and I = would > like to get your inputs on those. >=20 > Initial conditions: >=20 > * I have an ensemble of five Zookeeper servers running v3.4.5 code. > * The size of a committed snapshot file is in vicinity of 1GB. > * There are about 80 clients connected to the ensemble. > * Clients a heavily read biased, i.e., they mostly read and rarely = write. I > would say less than 0.1% of queries modify the data. >=20 > Problem statement: >=20 > * Under certain conditions, I may need to revert the data stored in = the > ensemble to an earlier state. For example, one of the clients may ruin = the > application-level data integrity and I need to perform a disaster = recovery. >=20 > Things look nice and easy if I'm dealing with a single Zookeeper = server. A > file-level copy of the data and dataLog directories should allow me to > recover later by stopping Zookeeper, swapping the corrupted data and > dataLog directories with a backup, and firing Zookeeper back up. >=20 > Now, the ensemble deployment and the leader election algorithm in the > quorum make things much more difficult. In order to restore from a = single > file-level backup, I need to take the whole ensemble down, wipe out = data > and dataLog directories on all servers, replace these directories with > backed up content on one of the servers, bring this server up first, = and > then bring up the rest of the ensemble. This [somewhat] guarantees = that the > populated Zookeeper server becomes a member of a majority and = populates the > ensemble. This approach works but it is very involving and, thus, > error-prone due to a human error. >=20 > Based on a study of Zookeeper source code, I am considering the = following > alternatives. And I seek advice from Zookeeper development community = as to > which approach looks more promising or if there is a better way. >=20 > Approach #1: >=20 > Develop a complementary pair of utilities for export and import of the > data. Both utilities will act as Zookeeper clients and use the = existing > API. The "export" utility will recursively retrieve data and store it = in a > file. The "import" utility will first purge all data from the ensemble = and > then reload it from the file. >=20 > This approach seems to be the simplest and there are similar tools > developed already. For example, the Guano Project: > https://github.com/d2fn/guano >=20 > I don't like two things about it: > * Poor performance even on a backup for the data store of my size. > * Possible data consistency issues due to concurrent access by the = export > utility as well as other "normal" clients. >=20 > Approach #2: >=20 > Add another four-letter command that would force rolling up the > transactions and creating a snapshot. The result of this command would = be a > new snapshot.XXXX file on disk and the name of the file could be = reported > back to the client as a response to the four-letter command. This way, = I > would know which snapshot file to grab for future possible restore. = But > restoring from a snapshot file is almost as involving as the = error-prone > sequence described in the "Initial conditions" above. >=20 > Approach #3: >=20 > Come up with a way to temporarily add a new Zookeeper server into a = live > ensemble, that would overtake (how?) the leader role and push out the > snapshot that it has into all ensemble members upon restore. This = approach > could be difficult and error-prone to implement because it will = require > hacking the existing election algorithm to designate a leader. >=20 > So, which of the approaches do you think works best for an ensemble = and for > the database size of about 1GB? >=20 >=20 > Any advice will be highly appreciated! > /Sergey