zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jordan Zimmerman <jzimmer...@netflix.com>
Subject Re: Backups
Date Thu, 19 Jan 2012 19:07:24 GMT
It's that very replication that creates the need for backups. In there is
a user error or a bad injection of data, the error will quickly replicate
to all the instances. There's no way to recover without an external backup.


-JZ


On 1/19/12 10:39 AM, "Flavio Junqueira" <fpj@yahoo-inc.com> wrote:

>Hi Ted, Znodes for leader election, group membership, etc, can all be
>recreated, so why should I back them up instead of recreating the
>znodes? In fact, one might bring back a previous snapshot of the
>system that reflects an incorrect system state.
>
>In the case that one stores data that can't be recovered by other
>means, I understand the need, but then we have the durability problem
>that I mentioned and you apparently agreed. Also, ZooKeeper is a
>replicated service, so why can't you simply rely upon the replication
>strategy that ZooKeeper provides to you already? Again, I'm trying to
>understand the use cases here.
>
>Thanks,
>-Flavio
>
>On Jan 19, 2012, at 7:11 PM, Ted Dunning wrote:
>
>> A backup can still be useful.  It is a common property that a database
>> backup is known to be slightly out of date.
>>
>> Such a backup can still be very useful.  In many systems, the most
>> common
>> cause of error is simple human intervention.  This especially
>> applies to
>> file systems and databases, but can still apply to ZK if an admin
>> carelessly tries to clean up part of the namespace and accidentally
>> cleans
>> up all of it.  This should be much less common with ZK because manual
>> adjustments are so much less a part of standard operation, but they
>> can
>> still occur.  In these cases, an out-of-date backup may be enormously
>> valuable.
>>
>> If somebody wants a precise backup from a particular moment in time,
>> the
>> best option is to use the snapshot capabilities exposed by various
>> file
>> systems.  Traditional NAS vendors all support this.  At a lower cost
>> and
>> complexity point, you can get this from MapR clusters exposed as NFS
>> or by
>> a ZFS file system.  This option also allows you to keep multiple
>> snapshots
>> from points in the past.
>>
>> What Jordan is doing would allow backups without special storage
>> devices
>> and, with good backup of the log, would allow nearly current
>> recovery in
>> the event of catastrophic loss.  Yes, this loses some durability,
>> but it is
>> still very desirable.
>>
>> On Thu, Jan 19, 2012 at 11:07 AM, Flavio Junqueira <fpj@yahoo-
>> inc.com>wrote:
>>
>>> Since you started this thread, I've been thinking about the idea of
>>> backing up, and I'm not sure I understand the motivation and if it
>>> is ok to
>>> violate safety properties.
>>>
>>> Given that ZooKeeper is used for coordination, I would think that
>>> in many
>>> cases all its state can be reconstructed in an algorithmic manner.
>>> Perhaps
>>> the use case for a backup would be the one in which it is being
>>> used as a
>>> database, for example, to keep the metadata of a file system.
>>> Periodic
>>> backups or even keeping an observer, however, won't guarantee that
>>> if you
>>> bring the system up using that backup you'll have all committed
>>> operations.
>>> The state of the leader reflects all committed operations, but one
>>> needs to
>>> have the latest state of the transaction log to not miss an update.
>>>
>>> But, it is true that I'm assuming that you can't miss updates. If
>>> you can
>>> miss updates, then that's a different story. By missing updates
>>> we'll be
>>> violating durability, which is  a property that ZooKeeper is
>>> supposed to
>>> provide, so I'm trying to understand in which cases violating
>>> durability
>>> would be acceptable. If it is not acceptable and you still want to
>>> have a
>>> backup, then I don't see a way other than shutting down the clients
>>> before
>>> you take a backup, which doesn't seem to be what is being proposed
>>> here.
>>>
>>> -Flavio
>>>
>>>
>>> On Jan 18, 2012, at 1:38 AM, Jordan Zimmerman wrote:
>>>
>>> Neha - can you send me your email address. Send it to:
>>>> jzimmerman@netflix.com
>>>>
>>>> On 1/17/12 10:10 AM, "Neha Narkhede" <neha.narkhede@gmail.com>
>>>> wrote:
>>>>
>>>> Jordan,
>>>>>
>>>>> I'd be interested in previewing it. Let me know.
>>>>>
>>>>> Thanks,
>>>>> Neha
>>>>>
>>>>> On Mon, Jan 16, 2012 at 5:42 PM, Jordan Zimmerman
>>>>> <jzimmerman@netflix.com> wrote:
>>>>>
>>>>>> We'll be backing up to S3. Wouldn't it be redundant to backup
>>>>>> all the
>>>>>> instances?
>>>>>>
>>>>>> -JZ
>>>>>>
>>>>>> P.S. I'm working on a ZooKeeper instance manager that will have
>>>>>> backup/restore and a bunch of other stuff. We'll be open
>>>>>> sourcing it. If
>>>>>> anyone is interested in previewing it let me know.
>>>>>>
>>>>>>
>>>>>> On 1/16/12 5:39 PM, "Patrick Hunt" <phunt@apache.org> wrote:
>>>>>>
>>>>>> Why would you limit to the leader? Wouldn't backing up any
>>>>>> server (as
>>>>>>> long as it's active) be sufficient? If you search the list it's
>>>>>>> been
>>>>>>> discussed before, using Observers seemed like a reasonable
>>>>>>> option as
>>>>>>> well.
>>>>>>>
>>>>>>> Patrick
>>>>>>>
>>>>>>> On Fri, Jan 13, 2012 at 2:29 PM, Jordan Zimmerman
>>>>>>> <jzimmerman@netflix.com> wrote:
>>>>>>>
>>>>>>>> That's easy as the backup app is running on the same machine
>>>>>>>> as the ZK
>>>>>>>> instance. I can use 'stat' to see if "my" instance is the
>>>>>>>> leader.
>>>>>>>>
>>>>>>>> On 1/13/12 2:28 PM, "Camille Fournier" <camille@apache.org>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>> You want to have to figure out who the leader is every time
>>>>>>>> you want
>>>>>>>>> to
>>>>>>>>> take a backup? That would be the downside to this strategy
I
>>>>>>>>> would
>>>>>>>>> think.
>>>>>>>>>
>>>>>>>>> C
>>>>>>>>>
>>>>>>>>> From my phone
>>>>>>>>> On Jan 13, 2012 5:24 PM, "Jordan Zimmerman"
>>>>>>>>><jzimmerman@netflix.com
>>>>>>>>> >
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> As a backup strategy, it seems I would only want to backup
>>>>>>>>> snapshots
>>>>>>>>>> from
>>>>>>>>>> the leader. Does that make sense?
>>>>>>>>>>
>>>>>>>>>> -JZ
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>> flavio
>>> junqueira
>>>
>>> research scientist
>>>
>>> fpj@yahoo-inc.com
>>> direct +34 93-183-8828
>>>
>>> avinguda diagonal 177, 8th floor, barcelona, 08018, es
>>> phone (408) 349 3300    fax (408) 349 3301
>>>
>>>
>
>flavio
>junqueira
>
>research scientist
>
>fpj@yahoo-inc.com
>direct +34 93-183-8828
>
>avinguda diagonal 177, 8th floor, barcelona, 08018, es
>phone (408) 349 3300    fax (408) 349 3301
>


Mime
View raw message