zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jack ma <jackma1...@gmail.com>
Subject Re: Efficient backup and a reasonable restore of an ensemble
Date Tue, 16 Jul 2013 15:38:10 GMT
Does someone have the answers for Sergey's questions?

I want to make sure I am fully understanding the procedures of zookeeper
backup and disaster recovery:

For the backup procedures at zookeeper assemble:
(1) Login to any host which state is "Serving"
           Question:
                  Do I have to login to leader node, or any node is ok?
(2) Copy latest snapshot file and transaction log from version-2 directory.
           Question:
                  How to make sure we do not copy corrupt files if the
snapshot/transaction log is in the middle of update? Do we have to shutdown
the node to make the copy?
                  besides the transaction log and snapshot, do we have to
copy other files such as the ecoch files

For the disaster recovery procedures at zookeeper assemble:
(1) recreate the machines for the zookeeper ensemble
(2) copy snapshot/transaction log we backed up into the zookeeper
dataDir\version-2 and logDir\version2.
           Question:
                 Do we have to copy the epoch files?
                 Do we have to copy snapshot/transaction log backed up to
all the zookeeper node, or just the first node we starts?

Appreciate your time and help.
Jack


On Mon, Jul 8, 2013 at 9:25 PM, Sergey Maslyakov <evolvah@gmail.com> wrote:

> These are interesting points, Thawan. I'd like to make sure that I get them
> right.
>
> 1. Are you saying that a snapshot file may not be sufficient to restore
> Zookeeper to a consistent state? Does it always require a transaction log
> file or is it required to get to the most current state? I was hoping that
> a snapshot is self-sufficient to do a restore to recent but not necessarily
> most current state. Was I wrong?
>
> 2. Do you suggest that the same pair of a snapshot (and a transaction log)
> needs to be copied on all servers before they are brought online? The what
> about the "epoch" files? Do they need to be purged, preserved, or same one
> populated through the whole ensemble?
>
>
> On Mon, Jul 8, 2013 at 7:53 PM, Thawan Kooburat <thawan@fb.com> wrote:
>
> > Just saw that  this is the corresponding use case to the question posted
> > in dev list.
> >
> > In order to restore the data to a given point in time correctly, you need
> > both snapshot and txnlog. This is because zookeeper snapshot is fuzzy and
> > snapshot alone may not represent a valid state of the server if there are
> > in-flight requests.
> >
> > The 4wl command should cause the server to roll the log and take a
> > snapshot similar to periodic snapshotting operation. Your backup script
> > need grap the snapshot and corresponding txnlog file from the data dir.
> >
> > To restore, just shutdown all hosts, clear the data dir, copy over the
> > snapshot and txnlog, and restart them.
> >
> >
> > --
> > Thawan Kooburat
> >
> >
> >
> >
> >
> > On 7/8/13 3:28 PM, "Sergey Maslyakov" <evolvah@gmail.com> wrote:
> >
> > >Thank you for your response, Flavio. I apologize, I did not provide a
> > >clear
> > >explanation of the use case.
> > >
> > >This backup/restore is not intended to be tied to any write event,
> > >instead,
> > >it is expected to run as a periodic (daily?) cron job on one of the
> > >servers, which is not guaranteed to be the leader of the ensemble. There
> > >is
> > >no expectation that all recent changes are committed and persisted to
> > >disk.
> > >The system can sustain the loss of several hours worth of recent changes
> > >in
> > >the event of restore.
> > >
> > >As for finding the leader dynamically and performing backup on it, this
> > >approach could be more difficult as the leader can change time to time
> and
> > >I still need to fetch the file to store it in my designated backup
> > >location. Taking backup on one server and picking it up from a local
> file
> > >system looks less error-prone. Even if I went the fancy route and had
> > >Zookeeper send me the serialized DataTree in response to the 4wl, this
> > >approach would involve a lot of moving parts.
> > >
> > >I have already made a PoC for a new 4wl that invokes takeSnapshot() and
> > >returns an absolute path to the snapshot it drops on disk. I have
> already
> > >protected takeSnapshot() from concurrent invocation, which is likely to
> > >corrupt the snapshot file on disk. This approach works but I'm thinking
> to
> > >take it one step further by providing the desired path name as an
> argument
> > >to my new 4lw and to have Zookeeper server drop the snapshot into the
> > >specified file and report success/failure back. This way I can avoid
> > >cluttering the data directory and interfering with what Zookeeper finds
> > >when it scans the data directory.
> > >
> > >Approach with having an additional server that would take the leadership
> > >and populate the ensemble is just a theory. I don't see a clean way of
> > >making a quorum member the leader of the quorum. Am I overlooking
> > >something
> > >simple?
> > >
> > >In backup and restore of an ensemble the biggest unknown for me remains
> > >populating the ensemble with desired data. I can think of two ways:
> > >
> > >1. Clear out all servers by stopping them, purge version-2 directories,
> > >restore a snapshot file on one server that will be brought first, and
> then
> > >bring up the rest of the ensemble. This way I somewhat force the first
> > >server to be the leader because it has data and it will be the only
> member
> > >of a quorum with data, provided to the way I start the ensemble. This
> > >looks
> > >like a hack, though.
> > >
> > >2. Clear out the ensemble and reload it with a dedicated client using
> the
> > >provided Zookeeper API.
> > >
> > >With the approach of backing up an actual snapshot file, option #1
> appears
> > >to be more practical.
> > >
> > >I wish I could start the ensemble with a designate leader that would
> > >bootstrap the ensemble with data and then the ensemble would go into its
> > >normal business...
> > >
> > >
> > >
> > >On Mon, Jul 8, 2013 at 4:30 PM, Flavio Junqueira
> > ><fpjunqueira@yahoo.com>wrote:
> > >
> > >> One bit that is still a bit confusing to me in your use case is if you
> > >> need to take a snapshot right after some event in your application.
> > >>Even if
> > >> you're able to tell ZooKeeper to take a snapshot, there is no
> guarantee
> > >> that it will happen at the exact point you want it if update
> operations
> > >> keep coming.
> > >>
> > >> If you use your four-letter word approach, then would you search for
> the
> > >> leader or would you simply take a snapshot at any server? If it has to
> > >>go
> > >> through the leader so that you make sure to have the most recent
> > >>committed
> > >> state, then it might not be a bad idea to have an api call that tells
> > >>the
> > >> leader to take a snapshot at some directory of your choice. Informing
> > >>you
> > >> the name of the snapshot file so that you can copy sounds like an
> > >>option,
> > >> but perhaps it is not as convenient.
> > >>
> > >> The approach of adding another server is not very clear. How do you
> > >>force
> > >> it to be the leader? Keep in mind that if it crashes, then it will
> lose
> > >> leadership.
> > >>
> > >> -Flavio
> > >>
> > >> On Jul 8, 2013, at 8:34 AM, Sergey Maslyakov <evolvah@gmail.com>
> wrote:
> > >>
> > >> > It looks like the "dev" mailing list is rather inactive. Over the
> past
> > >> few
> > >> > days I only saw several automated emails from JIRA and this is
> pretty
> > >> much
> > >> > it. Contrary to this, the "user" mailing list seems to be more alive
> > >>and
> > >> > more populated.
> > >> >
> > >> > With this in mind, please allow me to cross-post here the message
I
> > >>sent
> > >> > into the "dev" list a few days ago.
> > >> >
> > >> >
> > >> > Regards,
> > >> > /Sergey
> > >> >
> > >> > === forwarded message begins here ===
> > >> >
> > >> > Hi!
> > >> >
> > >> > I'm facing the problem that has been raised by multiple people but
> > >>none
> > >> of
> > >> > the discussion threads seem to provide a good answer. I dug in
> > >>Zookeeper
> > >> > source code trying to come up with some possible approaches and I
> > >>would
> > >> > like to get your inputs on those.
> > >> >
> > >> > Initial conditions:
> > >> >
> > >> > * I have an ensemble of five Zookeeper servers running v3.4.5 code.
> > >> > * The size of a committed snapshot file is in vicinity of 1GB.
> > >> > * There are about 80 clients connected to the ensemble.
> > >> > * Clients a heavily read biased, i.e., they mostly read and rarely
> > >> write. I
> > >> > would say less than 0.1% of queries modify the data.
> > >> >
> > >> > Problem statement:
> > >> >
> > >> > * Under certain conditions, I may need to revert the data stored in
> > >>the
> > >> > ensemble to an earlier state. For example, one of the clients may
> ruin
> > >> the
> > >> > application-level data integrity and I need to perform a disaster
> > >> recovery.
> > >> >
> > >> > Things look nice and easy if I'm dealing with a single Zookeeper
> > >>server.
> > >> A
> > >> > file-level copy of the data and dataLog directories should allow me
> to
> > >> > recover later by stopping Zookeeper, swapping the corrupted data and
> > >> > dataLog directories with a backup, and firing Zookeeper back up.
> > >> >
> > >> > Now, the ensemble deployment and the leader election algorithm in
> the
> > >> > quorum make things much more difficult. In order to restore from a
> > >>single
> > >> > file-level backup, I need to take the whole ensemble down, wipe out
> > >>data
> > >> > and dataLog directories on all servers, replace these directories
> with
> > >> > backed up content on one of the servers, bring this server up first,
> > >>and
> > >> > then bring up the rest of the ensemble. This [somewhat] guarantees
> > >>that
> > >> the
> > >> > populated Zookeeper server becomes a member of a majority and
> > >>populates
> > >> the
> > >> > ensemble. This approach works but it is very involving and, thus,
> > >> > error-prone due to a human error.
> > >> >
> > >> > Based on a study of Zookeeper source code, I am considering the
> > >>following
> > >> > alternatives. And I seek advice from Zookeeper development community
> > >>as
> > >> to
> > >> > which approach looks more promising or if there is a better way.
> > >> >
> > >> > Approach #1:
> > >> >
> > >> > Develop a complementary pair of utilities for export and import of
> the
> > >> > data. Both utilities will act as Zookeeper clients and use the
> > >>existing
> > >> > API. The "export" utility will recursively retrieve data and store
> it
> > >>in
> > >> a
> > >> > file. The "import" utility will first purge all data from the
> ensemble
> > >> and
> > >> > then reload it from the file.
> > >> >
> > >> > This approach seems to be the simplest and there are similar tools
> > >> > developed already. For example, the Guano Project:
> > >> > https://github.com/d2fn/guano
> > >> >
> > >> > I don't like two things about it:
> > >> > * Poor performance even on a backup for the data store of my size.
> > >> > * Possible data consistency issues due to concurrent access by the
> > >>export
> > >> > utility as well as other "normal" clients.
> > >> >
> > >> > Approach #2:
> > >> >
> > >> > Add another four-letter command that would force rolling up the
> > >> > transactions and creating a snapshot. The result of this command
> would
> > >> be a
> > >> > new snapshot.XXXX file on disk and the name of the file could be
> > >>reported
> > >> > back to the client as a response to the four-letter command. This
> > >>way, I
> > >> > would know which snapshot file to grab for future possible restore.
> > >>But
> > >> > restoring from a snapshot file is almost as involving as the
> > >>error-prone
> > >> > sequence described in the "Initial conditions" above.
> > >> >
> > >> > Approach #3:
> > >> >
> > >> > Come up with a way to temporarily add a new Zookeeper server into
a
> > >>live
> > >> > ensemble, that would overtake (how?) the leader role and push out
> the
> > >> > snapshot that it has into all ensemble members upon restore. This
> > >> approach
> > >> > could be difficult and error-prone to implement because it will
> > >>require
> > >> > hacking the existing election algorithm to designate a leader.
> > >> >
> > >> > So, which of the approaches do you think works best for an ensemble
> > >>and
> > >> for
> > >> > the database size of about 1GB?
> > >> >
> > >> >
> > >> > Any advice will be highly appreciated!
> > >> > /Sergey
> > >>
> > >>
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message