zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From kishore g <g.kish...@gmail.com>
Subject Re: Efficient backup and a reasonable restore of an ensemble
Date Tue, 09 Jul 2013 05:08:44 GMT
Its not really elaborate, it is very similar to what zookeeper does when it
starts up. It first reads the latest snapshot file and then the transaction
logs and applies each and every transaction. What I am suggesting is that
instead of applying all transactions stop at a transaction i provide.

Having this tool will actually simplify your task, you can go back to any
point in time. Think of a something like this.

checkpoint A // this can store the last zxid or timestamp from the leader.
Make changes to zk
//if things fails
stop zks
rollback A//run this on each zk, brings back the cluster to its previous
state.
start zks // any order should be fine.


Also keep in mind that snapshot is fuzzy only if there are writes happening
while taking snapshot. If you are sure no writes will happen when you are
taking the snapshot then you are good. Experts, please correct me if this
is incorrect.

thanks,
Kishore G


On Mon, Jul 8, 2013 at 9:42 PM, Sergey Maslyakov <evolvah@gmail.com> wrote:

> Kishore,
>
> This sounds like a very elaborate tool. I was trying to find a simplistic
> approach but what Thawan said about "fuzzy snapshots" makes me a little
> afraid that there is no simple solution.
>
>
> On Mon, Jul 8, 2013 at 11:05 PM, kishore g <g.kishore@gmail.com> wrote:
>
> > Agree, we already have such a tool. In fact we use it to reconstruct the
> > sequence of events that led to a failure and actually restore the system
> to
> > a previous stable point and replay the events. Unfortunately this is tied
> > closely with Helix but it should be easy to make this a generic tool.
> >
> > Sergey is this something that will be useful in your case.
> >
> > Thanks,
> > Kishore G
> >
> >
> > On Mon, Jul 8, 2013 at 8:09 PM, Thawan Kooburat <thawan@fb.com> wrote:
> >
> > > On restore part, I think having a separate utility to manipulate the
> > > data/snap dir (by truncating the log/removing snapshot to a given zxid)
> > > would be easier than modifying the server.
> > >
> > >
> > > --
> > > Thawan Kooburat
> > >
> > >
> > >
> > >
> > >
> > > On 7/8/13 6:34 PM, "kishore g" <g.kishore@gmail.com> wrote:
> > >
> > > >I think what we are looking at is a  point in time restore
> > functionality.
> > > >How about adding a feature that says go back to a specific
> > zxid/timestamp.
> > > >This way before doing any change to zookeeper simply note down the
> > > >timestamp/zxid on leader. If things go wrong after making changes,
> bring
> > > >down zookeepers and provide additional parameter of a zxid/timestamp
> > while
> > > >restarting. The server can go the exact point and make it current. The
> > > >followers can be started blank.
> > > >
> > > >
> > > >
> > > >On Mon, Jul 8, 2013 at 5:53 PM, Thawan Kooburat <thawan@fb.com>
> wrote:
> > > >
> > > >> Just saw that  this is the corresponding use case to the question
> > posted
> > > >> in dev list.
> > > >>
> > > >> In order to restore the data to a given point in time correctly, you
> > > >>need
> > > >> both snapshot and txnlog. This is because zookeeper snapshot is
> fuzzy
> > > >>and
> > > >> snapshot alone may not represent a valid state of the server if
> there
> > > >>are
> > > >> in-flight requests.
> > > >>
> > > >> The 4wl command should cause the server to roll the log and take a
> > > >> snapshot similar to periodic snapshotting operation. Your backup
> > script
> > > >> need grap the snapshot and corresponding txnlog file from the data
> > dir.
> > > >>
> > > >> To restore, just shutdown all hosts, clear the data dir, copy over
> the
> > > >> snapshot and txnlog, and restart them.
> > > >>
> > > >>
> > > >> --
> > > >> Thawan Kooburat
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >> On 7/8/13 3:28 PM, "Sergey Maslyakov" <evolvah@gmail.com> wrote:
> > > >>
> > > >> >Thank you for your response, Flavio. I apologize, I did not
> provide a
> > > >> >clear
> > > >> >explanation of the use case.
> > > >> >
> > > >> >This backup/restore is not intended to be tied to any write event,
> > > >> >instead,
> > > >> >it is expected to run as a periodic (daily?) cron job on one of
the
> > > >> >servers, which is not guaranteed to be the leader of the ensemble.
> > > >>There
> > > >> >is
> > > >> >no expectation that all recent changes are committed and persisted
> to
> > > >> >disk.
> > > >> >The system can sustain the loss of several hours worth of recent
> > > >>changes
> > > >> >in
> > > >> >the event of restore.
> > > >> >
> > > >> >As for finding the leader dynamically and performing backup on
it,
> > this
> > > >> >approach could be more difficult as the leader can change time
to
> > time
> > > >>and
> > > >> >I still need to fetch the file to store it in my designated backup
> > > >> >location. Taking backup on one server and picking it up from a
> local
> > > >>file
> > > >> >system looks less error-prone. Even if I went the fancy route
and
> had
> > > >> >Zookeeper send me the serialized DataTree in response to the 4wl,
> > this
> > > >> >approach would involve a lot of moving parts.
> > > >> >
> > > >> >I have already made a PoC for a new 4wl that invokes takeSnapshot()
> > and
> > > >> >returns an absolute path to the snapshot it drops on disk. I have
> > > >>already
> > > >> >protected takeSnapshot() from concurrent invocation, which is
> likely
> > to
> > > >> >corrupt the snapshot file on disk. This approach works but I'm
> > > >>thinking to
> > > >> >take it one step further by providing the desired path name as
an
> > > >>argument
> > > >> >to my new 4lw and to have Zookeeper server drop the snapshot into
> the
> > > >> >specified file and report success/failure back. This way I can
> avoid
> > > >> >cluttering the data directory and interfering with what Zookeeper
> > finds
> > > >> >when it scans the data directory.
> > > >> >
> > > >> >Approach with having an additional server that would take the
> > > >>leadership
> > > >> >and populate the ensemble is just a theory. I don't see a clean
way
> > of
> > > >> >making a quorum member the leader of the quorum. Am I overlooking
> > > >> >something
> > > >> >simple?
> > > >> >
> > > >> >In backup and restore of an ensemble the biggest unknown for me
> > remains
> > > >> >populating the ensemble with desired data. I can think of two
ways:
> > > >> >
> > > >> >1. Clear out all servers by stopping them, purge version-2
> > directories,
> > > >> >restore a snapshot file on one server that will be brought first,
> and
> > > >>then
> > > >> >bring up the rest of the ensemble. This way I somewhat force the
> > first
> > > >> >server to be the leader because it has data and it will be the
only
> > > >>member
> > > >> >of a quorum with data, provided to the way I start the ensemble.
> This
> > > >> >looks
> > > >> >like a hack, though.
> > > >> >
> > > >> >2. Clear out the ensemble and reload it with a dedicated client
> using
> > > >>the
> > > >> >provided Zookeeper API.
> > > >> >
> > > >> >With the approach of backing up an actual snapshot file, option
#1
> > > >>appears
> > > >> >to be more practical.
> > > >> >
> > > >> >I wish I could start the ensemble with a designate leader that
> would
> > > >> >bootstrap the ensemble with data and then the ensemble would go
> into
> > > >>its
> > > >> >normal business...
> > > >> >
> > > >> >
> > > >> >
> > > >> >On Mon, Jul 8, 2013 at 4:30 PM, Flavio Junqueira
> > > >> ><fpjunqueira@yahoo.com>wrote:
> > > >> >
> > > >> >> One bit that is still a bit confusing to me in your use case
is
> if
> > > >>you
> > > >> >> need to take a snapshot right after some event in your
> application.
> > > >> >>Even if
> > > >> >> you're able to tell ZooKeeper to take a snapshot, there is
no
> > > >>guarantee
> > > >> >> that it will happen at the exact point you want it if update
> > > >>operations
> > > >> >> keep coming.
> > > >> >>
> > > >> >> If you use your four-letter word approach, then would you
search
> > for
> > > >>the
> > > >> >> leader or would you simply take a snapshot at any server?
If it
> has
> > > >>to
> > > >> >>go
> > > >> >> through the leader so that you make sure to have the most
recent
> > > >> >>committed
> > > >> >> state, then it might not be a bad idea to have an api call
that
> > tells
> > > >> >>the
> > > >> >> leader to take a snapshot at some directory of your choice.
> > Informing
> > > >> >>you
> > > >> >> the name of the snapshot file so that you can copy sounds
like an
> > > >> >>option,
> > > >> >> but perhaps it is not as convenient.
> > > >> >>
> > > >> >> The approach of adding another server is not very clear.
How do
> you
> > > >> >>force
> > > >> >> it to be the leader? Keep in mind that if it crashes, then
it
> will
> > > >>lose
> > > >> >> leadership.
> > > >> >>
> > > >> >> -Flavio
> > > >> >>
> > > >> >> On Jul 8, 2013, at 8:34 AM, Sergey Maslyakov <evolvah@gmail.com>
> > > >>wrote:
> > > >> >>
> > > >> >> > It looks like the "dev" mailing list is rather inactive.
Over
> the
> > > >>past
> > > >> >> few
> > > >> >> > days I only saw several automated emails from JIRA and
this is
> > > >>pretty
> > > >> >> much
> > > >> >> > it. Contrary to this, the "user" mailing list seems
to be more
> > > >>alive
> > > >> >>and
> > > >> >> > more populated.
> > > >> >> >
> > > >> >> > With this in mind, please allow me to cross-post here
the
> > message I
> > > >> >>sent
> > > >> >> > into the "dev" list a few days ago.
> > > >> >> >
> > > >> >> >
> > > >> >> > Regards,
> > > >> >> > /Sergey
> > > >> >> >
> > > >> >> > === forwarded message begins here ===
> > > >> >> >
> > > >> >> > Hi!
> > > >> >> >
> > > >> >> > I'm facing the problem that has been raised by multiple
people
> > but
> > > >> >>none
> > > >> >> of
> > > >> >> > the discussion threads seem to provide a good answer.
I dug in
> > > >> >>Zookeeper
> > > >> >> > source code trying to come up with some possible approaches
> and I
> > > >> >>would
> > > >> >> > like to get your inputs on those.
> > > >> >> >
> > > >> >> > Initial conditions:
> > > >> >> >
> > > >> >> > * I have an ensemble of five Zookeeper servers running
v3.4.5
> > code.
> > > >> >> > * The size of a committed snapshot file is in vicinity
of 1GB.
> > > >> >> > * There are about 80 clients connected to the ensemble.
> > > >> >> > * Clients a heavily read biased, i.e., they mostly read
and
> > rarely
> > > >> >> write. I
> > > >> >> > would say less than 0.1% of queries modify the data.
> > > >> >> >
> > > >> >> > Problem statement:
> > > >> >> >
> > > >> >> > * Under certain conditions, I may need to revert the
data
> stored
> > in
> > > >> >>the
> > > >> >> > ensemble to an earlier state. For example, one of the
clients
> may
> > > >>ruin
> > > >> >> the
> > > >> >> > application-level data integrity and I need to perform
a
> disaster
> > > >> >> recovery.
> > > >> >> >
> > > >> >> > Things look nice and easy if I'm dealing with a single
> Zookeeper
> > > >> >>server.
> > > >> >> A
> > > >> >> > file-level copy of the data and dataLog directories
should
> allow
> > > >>me to
> > > >> >> > recover later by stopping Zookeeper, swapping the corrupted
> data
> > > >>and
> > > >> >> > dataLog directories with a backup, and firing Zookeeper
back
> up.
> > > >> >> >
> > > >> >> > Now, the ensemble deployment and the leader election
algorithm
> in
> > > >>the
> > > >> >> > quorum make things much more difficult. In order to
restore
> from
> > a
> > > >> >>single
> > > >> >> > file-level backup, I need to take the whole ensemble
down, wipe
> > out
> > > >> >>data
> > > >> >> > and dataLog directories on all servers, replace these
> directories
> > > >>with
> > > >> >> > backed up content on one of the servers, bring this
server up
> > > >>first,
> > > >> >>and
> > > >> >> > then bring up the rest of the ensemble. This [somewhat]
> > guarantees
> > > >> >>that
> > > >> >> the
> > > >> >> > populated Zookeeper server becomes a member of a majority
and
> > > >> >>populates
> > > >> >> the
> > > >> >> > ensemble. This approach works but it is very involving
and,
> thus,
> > > >> >> > error-prone due to a human error.
> > > >> >> >
> > > >> >> > Based on a study of Zookeeper source code, I am considering
the
> > > >> >>following
> > > >> >> > alternatives. And I seek advice from Zookeeper development
> > > >>community
> > > >> >>as
> > > >> >> to
> > > >> >> > which approach looks more promising or if there is a
better
> way.
> > > >> >> >
> > > >> >> > Approach #1:
> > > >> >> >
> > > >> >> > Develop a complementary pair of utilities for export
and import
> > of
> > > >>the
> > > >> >> > data. Both utilities will act as Zookeeper clients and
use the
> > > >> >>existing
> > > >> >> > API. The "export" utility will recursively retrieve
data and
> > store
> > > >>it
> > > >> >>in
> > > >> >> a
> > > >> >> > file. The "import" utility will first purge all data
from the
> > > >>ensemble
> > > >> >> and
> > > >> >> > then reload it from the file.
> > > >> >> >
> > > >> >> > This approach seems to be the simplest and there are
similar
> > tools
> > > >> >> > developed already. For example, the Guano Project:
> > > >> >> > https://github.com/d2fn/guano
> > > >> >> >
> > > >> >> > I don't like two things about it:
> > > >> >> > * Poor performance even on a backup for the data store
of my
> > size.
> > > >> >> > * Possible data consistency issues due to concurrent
access by
> > the
> > > >> >>export
> > > >> >> > utility as well as other "normal" clients.
> > > >> >> >
> > > >> >> > Approach #2:
> > > >> >> >
> > > >> >> > Add another four-letter command that would force rolling
up the
> > > >> >> > transactions and creating a snapshot. The result of
this
> command
> > > >>would
> > > >> >> be a
> > > >> >> > new snapshot.XXXX file on disk and the name of the file
could
> be
> > > >> >>reported
> > > >> >> > back to the client as a response to the four-letter
command.
> This
> > > >> >>way, I
> > > >> >> > would know which snapshot file to grab for future possible
> > restore.
> > > >> >>But
> > > >> >> > restoring from a snapshot file is almost as involving
as the
> > > >> >>error-prone
> > > >> >> > sequence described in the "Initial conditions" above.
> > > >> >> >
> > > >> >> > Approach #3:
> > > >> >> >
> > > >> >> > Come up with a way to temporarily add a new Zookeeper
server
> > into a
> > > >> >>live
> > > >> >> > ensemble, that would overtake (how?) the leader role
and push
> out
> > > >>the
> > > >> >> > snapshot that it has into all ensemble members upon
restore.
> This
> > > >> >> approach
> > > >> >> > could be difficult and error-prone to implement because
it will
> > > >> >>require
> > > >> >> > hacking the existing election algorithm to designate
a leader.
> > > >> >> >
> > > >> >> > So, which of the approaches do you think works best
for an
> > ensemble
> > > >> >>and
> > > >> >> for
> > > >> >> > the database size of about 1GB?
> > > >> >> >
> > > >> >> >
> > > >> >> > Any advice will be highly appreciated!
> > > >> >> > /Sergey
> > > >> >>
> > > >> >>
> > > >>
> > > >>
> > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message