zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ismael Juma <ism...@juma.me.uk>
Subject Re: FYI - Apache ZooKeeper Backup, a Treatise
Date Thu, 16 Jun 2016 21:47:42 GMT
Hi Jordan,

Kafka stores ACLs as well as client and topic configs in ZooKeeper so that
lends credence to your argument, I think.

Ismael

On Thu, Jun 16, 2016 at 11:41 PM, Jordan Zimmerman <
jordan@jordanzimmerman.com> wrote:

> Contrary to recommendations everywhere, my experience is that almost
> everyone is storing source of truth data in ZooKeeper. It’s just too
> tempting. You have a distributed file system just sitting there and it’s
> too easy to use. You get a lot of great features like watches, etc. People
> are using it to store configuration data, sequence numbers, etc. They are
> storing these things without a good means of reproducing them in case of a
> catastrophic outage. Further, I’ve heard of several orgs who just back up
> the transaction logs and think they can restore them for DR. Anyway, that’s
> the genesis of my blog post.
>
> -Jordan
>
> > On Jun 16, 2016, at 2:39 PM, Chris Nauroth <cnauroth@hortonworks.com>
> wrote:
> >
> > Yes, thank you to Jordan for the article!
> >
> > Like Flavio, I personally have never come across the requirement for
> > ZooKeeper backups.  I've generally followed the pattern that data stored
> > in ZooKeeper is truly transient, and applications are built either to
> > tolerate loss of that data or reconstruct it from first principles if it
> > goes missing.  Adding observers in a second data center would give a
> > rudimentary approximation of off-site backup in the case of a data center
> > disaster, with the usual caveats around propagation delays.
> >
> > Jordan, I'd be curious if you can share more specific details about the
> > kind of data that you have that necessitates a backup/restore.  (If
> you're
> > not at liberty to share this, then I can understand that.)  It might
> > inform if we have a motivating use case for backup/restore features
> within
> > ZooKeeper, such as some of the transaction log filtering that the article
> > mentions.
> >
> > --Chris Nauroth
> >
> >
> >
> >
> > On 6/16/16, 1:03 AM, "Flavio Junqueira" <fpj@apache.org> wrote:
> >
> >> Great write-up, Jordan, thanks!
> >>
> >> Whether to backup zk data or not is possibly an open topic for this
> >> community, even though we have discussed it at times. My sense has been
> >> that precisely because of the issues you mention in your post, it is
> >> typically best to have a way to recreate its data upon a disaster rather
> >> than backup the data. I think there could be three general scenarios in
> >> which folks would prefer to backup data, but you correct me if these
> >> aren't accurate:
> >>
> >> - The data in zk isn't elsewhere, so it can't be recreated: zk isn't a
> >> regular database, so I'd think it is best not to store data and focus on
> >> cluster data or metadata.
> >> - There is a just a lot of data and I'd rather have a shorter time to
> >> recover: zk in general shouldn't have that much data in db, but let's go
> >> with the assumption that for the requirements of the application it is a
> >> lot. For such a case, it probably depends on whether your application
> can
> >> efficiently and effectively recover from a backup. Basically, as pointed
> >> out in the post, the data could be inconsistent and cause trouble if you
> >> don't think about the corner cases.
> >> - The code to recreate the zk metadata for my application is super
> >> complex: if you decide to code against zk, it is good to think whether
> >> reconstructing in the case of a disaster is doable and if it is design
> >> and implement to reconstruct the state upon a disaster.
> >>
> >> Also, we typically provision enough replicas, often replicating across
> >> data centers, to make sure that the data isn't all gone. Having more
> >> replicas does not rule out completely the possibility of a disaster, but
> >> in such rare cases we resort to the expensive path.
> >>
> >> I personally have never worked with an application that was taking
> >> backups of zk data in prod, so I'm really interested in what others
> >> think.
> >>
> >> -Flavio
> >>
> >>
> >>> On 16 Jun 2016, at 00:43, Jordan Zimmerman <jordan@jordanzimmerman.com
> >
> >>> wrote:
> >>>
> >>> FYI - I wrote a blog about backing up ZooKeeper:
> >>>
> >>> https://www.elastic.co/blog/zookeeper-backup-a-treatise
> >>>
> >>> -Jordan
> >>
> >
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message