zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jack ma <jackma1...@gmail.com>
Subject Re: Zookeeper ensemble backup questions?
Date Fri, 19 Jul 2013 22:28:51 GMT
Thanks Sergey.

That is great. Did you contribute you work back to zookeeper? When you take
a snapshot, did you have to block the zookeeper to accept for request?


On Fri, Jul 19, 2013 at 12:29 PM, Sergey Maslyakov <evolvah@gmail.com>wrote:

> A word of preemptive self-defense: I am not an experienced Java developer.
> Please, don't throw rotten eggs at me if I did not follow well-known Java
> coding patterns :)
>
>
> Regards,
> /Sergey
>
>
> On Fri, Jul 19, 2013 at 2:15 PM, Sergey Maslyakov <evolvah@gmail.com>
> wrote:
>
> > I can share this patch based on 3.4.5, which does thee trick.
> >
> > It adds a "snps" 4lw command that accepts one mandatory argument, which
> is
> > an absolute path for the direcotry where the snapshot file will be
> dropped.
> > The "absoluteness" of the path s verified by UNIX rules. Not sure how it
> > would work in Windows, though. The target directory must exist and be
> > writeable by the effective UID of Zookeeper server.
> >
> > If the operation was successful, Zookeeper server responds back with the
> > absolute path of the snapshot file. You can watch for the '/' character
> to
> > trigger your reaction to the response.
> >
> > In my case, a 700MB snapshot takes about 30 seconds to write out.
> >
> > Please see several examples below:
> >
> > ~ $ mkdir /tmp/snapshot-test
> >
> > ~ $ telnet localhost 12181
> > Trying 127.0.0.1...
> > Connected to localhost.
> > Escape character is '^]'.
> > snps /tmp/snapshot-test
> > /tmp/snapshot-test/snapshot.316c8
> > Connection to localhost closed by foreign host.
> >
> > ~ $ ls -al /tmp/snapshot-test/snapshot.316c8
> > -rw-r--r--   1 srvr     srvr     719602373 Jul 19 14:09
> > /tmp/snapshot-test/snapshot.316c8
> >
> > ~ $ telnet localhost 12181
> > Trying 127.0.0.1...
> > Connected to localhost.
> > Escape character is '^]'.
> > snps blah
> > Snapshot directory path must be absoulte, i.e., it must start with '/'.
> > Path "blah" does not meet the criteria.
> > Connection to localhost closed by foreign host.
> >
> > ~ $ telnet localhost 12181
> > Trying 127.0.0.1...
> > Connected to localhost.
> > Escape character is '^]'.
> > snps /tmp/blah
> > Error while serializing snapshot into /tmp/blah/snapshot.316c8.
> > /tmp/blah/snapshot.316c8 (No such file or directory)
> > Connection to localhost closed by foreign host.
> >
> > ~ $ telnet localhost 12181
> > Trying 127.0.0.1...
> > Connected to localhost.
> > Escape character is '^]'.
> > snps
> > Snapshot directory path must be absoulte, i.e., it must start with '/'.
> > Path "" does not meet the criteria.
> > Connection to localhost closed by foreign host.
> >
> > ~ $
> >
> >
> >
> >
> > On Fri, Jul 19, 2013 at 1:42 PM, jack ma <jackma1402@gmail.com> wrote:
> >
> >> Thanks Sergei.
> >>
> >> That is great improvement idea for the zookeeper. I think that zookeeper
> >> is
> >> planning to add a new 4lrt command "snap", but it is not ready yet.
> >>
> >> My original questions is based on the current version of zookeeper
> >> (3.4.5),
> >> do you have any answers for them?
> >>
> >> Appreciate for the help.
> >>
> >> thanks
> >> Jack
> >>
> >>
> >>
> >>
> >> On Fri, Jul 19, 2013 at 11:19 AM, Sergey Maslyakov <evolvah@gmail.com
> >> >wrote:
> >>
> >> > Jack,
> >> >
> >> > Here is how I see the backup process happening.
> >> >
> >> > 1. Zookeeper server can be changed to support a new 4lw that will
> write
> >> out
> >> > the current state of the DataTree into a snapshot file with the path
> and
> >> > name provided as an argument to this new command (barring all the
> >> > permissions, disk space, and other system-level restrictions).
> >> Probably, I
> >> > would ask Zookeeper to save the snapshot in a directory outside of the
> >> > standard "dataLog" for the sake of cleanliness.
> >> >
> >> > 2. When Zookeeper server responds to the new "snapshot" command with
> >> > success indication, the requesting process knows that the file has
> been
> >> > written out and it can go and process it. It can add some metadata and
> >> > create an archive to store it somewhere, for example. Alternatively,
> >> > Zookeeper server could stream the data it would have written into a
> >> > snapshot as the response to the new "snapshot" command. This way, the
> >> > client becomes responsible for persistence and this lifts a number of
> >> > permission-related issues (but raises some other issues too). Oh, and
> by
> >> > the way, it looks like snapshot files are rather compressible. I did
> see
> >> > the factor of 20 and more on the data that I have.
> >> >
> >> > 3. Disk cleanups are performed.
> >> >
> >> > With this backup procedure the restore would turn into:
> >> >
> >> > 1. Stopping all ensemble mebers
> >> >
> >> > 2. Wiping out dataDir/version-2 and dataLogDir/version-2
> >> >
> >> > 3. Restoring the snapshot taken by the above backup procedure on one
> of
> >> the
> >> > servers into dataDir/version-2
> >> >
> >> > 4. Bringing this server online
> >> >
> >> > 5. Allowing some time for it to load the snapshot. You could send
> "isro"
> >> > 4lw command to it to see when it stops responding with "null". When
> the
> >> > response becomes "ro" or "rw", this is when it is ready to populate
> >> others
> >> > with its own data
> >> >
> >> > 6. Bring up other servers one-by-one, to allow them form a quorum with
> >> the
> >> > populated server
> >> >
> >> >
> >> > Hope, this helps! I'd be glad to hear from people who know the
> >> internals of
> >> > Zookeeper server better whether this approach is flawed or robust.
> >> >
> >> >
> >> > Regards,
> >> > /Sergey
> >> >
> >> >
> >> > On Fri, Jul 19, 2013 at 1:00 PM, jack ma <jackma1402@gmail.com>
> wrote:
> >> >
> >> > > I asked those question in the thread
> >> > >
> >> > >
> >> >
> >>
> http://mail-archives.apache.org/mod_mbox/zookeeper-user/201307.mbox/%3cCAB+cfdwhOV0JfB04=MpO_+i-4ou=VbL=EG2XS557+j+698jx3A@mail.gmail.com%3e
> >> > > ,
> >> > > but there is no response for that.
> >> > >
> >> > > So I posted those questions again here, hopefully I could get helps
> >> > > from the community.
> >> > >
> >> > > I want to make sure I am fully understanding the procedures of
> >> zookeeper
> >> > > backup and disaster recovery:
> >> > >
> >> > > For the backup procedures at zookeeper assemble:
> >> > > (1) Login to any host which state is "Serving"
> >> > >            Question:
> >> > >                   Do I have to login to leader node, or any node is
> >> ok?
> >> > > (2) Copy latest snapshot file and transaction log from version-2
> >> > directory.
> >> > >            Question:
> >> > >                   How to make sure we do not copy corrupt files if
> the
> >> > > snapshot/transaction log is in the middle of update? Do we have to
> >> > shutdown
> >> > > the node to make the copy?
> >> > >                   besides the transaction log and snapshot, do we
> >> have to
> >> > > copy other files such as the ecoch files
> >> > >
> >> > > For the disaster recovery procedures at zookeeper assemble:
> >> > > (1) recreate the machines for the zookeeper ensemble
> >> > > (2) copy snapshot/transaction log we backed up into the zookeeper
> >> > > dataDir\version-2 and logDir\version2.
> >> > >            Question:
> >> > >                  Do we have to copy the epoch files?
> >> > >                  Do we have to copy snapshot/transaction log backed
> >> up to
> >> > > all the zookeeper node, or just the first node we starts?
> >> > >
> >> > > Appreciate your time and help.
> >> > > Jack
> >> > >
> >> >
> >>
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message