zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sergey Maslyakov <evol...@gmail.com>
Subject Fwd: Efficient backup and a reasonable restore of an ensemble
Date Mon, 08 Jul 2013 06:34:50 GMT
It looks like the "dev" mailing list is rather inactive. Over the past few
days I only saw several automated emails from JIRA and this is pretty much
it. Contrary to this, the "user" mailing list seems to be more alive and
more populated.

With this in mind, please allow me to cross-post here the message I sent
into the "dev" list a few days ago.


=== forwarded message begins here ===


I'm facing the problem that has been raised by multiple people but none of
the discussion threads seem to provide a good answer. I dug in Zookeeper
source code trying to come up with some possible approaches and I would
like to get your inputs on those.

Initial conditions:

* I have an ensemble of five Zookeeper servers running v3.4.5 code.
* The size of a committed snapshot file is in vicinity of 1GB.
* There are about 80 clients connected to the ensemble.
* Clients a heavily read biased, i.e., they mostly read and rarely write. I
would say less than 0.1% of queries modify the data.

Problem statement:

* Under certain conditions, I may need to revert the data stored in the
ensemble to an earlier state. For example, one of the clients may ruin the
application-level data integrity and I need to perform a disaster recovery.

Things look nice and easy if I'm dealing with a single Zookeeper server. A
file-level copy of the data and dataLog directories should allow me to
recover later by stopping Zookeeper, swapping the corrupted data and
dataLog directories with a backup, and firing Zookeeper back up.

Now, the ensemble deployment and the leader election algorithm in the
quorum make things much more difficult. In order to restore from a single
file-level backup, I need to take the whole ensemble down, wipe out data
and dataLog directories on all servers, replace these directories with
backed up content on one of the servers, bring this server up first, and
then bring up the rest of the ensemble. This [somewhat] guarantees that the
populated Zookeeper server becomes a member of a majority and populates the
ensemble. This approach works but it is very involving and, thus,
error-prone due to a human error.

Based on a study of Zookeeper source code, I am considering the following
alternatives. And I seek advice from Zookeeper development community as to
which approach looks more promising or if there is a better way.

Approach #1:

Develop a complementary pair of utilities for export and import of the
data. Both utilities will act as Zookeeper clients and use the existing
API. The "export" utility will recursively retrieve data and store it in a
file. The "import" utility will first purge all data from the ensemble and
then reload it from the file.

This approach seems to be the simplest and there are similar tools
developed already. For example, the Guano Project:

I don't like two things about it:
* Poor performance even on a backup for the data store of my size.
* Possible data consistency issues due to concurrent access by the export
utility as well as other "normal" clients.

Approach #2:

Add another four-letter command that would force rolling up the
transactions and creating a snapshot. The result of this command would be a
new snapshot.XXXX file on disk and the name of the file could be reported
back to the client as a response to the four-letter command. This way, I
would know which snapshot file to grab for future possible restore. But
restoring from a snapshot file is almost as involving as the error-prone
sequence described in the "Initial conditions" above.

Approach #3:

Come up with a way to temporarily add a new Zookeeper server into a live
ensemble, that would overtake (how?) the leader role and push out the
snapshot that it has into all ensemble members upon restore. This approach
could be difficult and error-prone to implement because it will require
hacking the existing election algorithm to designate a leader.

So, which of the approaches do you think works best for an ensemble and for
the database size of about 1GB?

Any advice will be highly appreciated!

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message