zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jordan Zimmerman <jor...@jordanzimmerman.com>
Subject Re: How to restore a snapshot after an accidental ZKclenup
Date Mon, 22 Feb 2016 18:17:39 GMT
Imagine a scenario in a unstable ensemble where the ZK leader is moving around. What server
are you getting your transaction logs from? How old are they? There is a potential to go back
in time. For persistent nodes this probably isn’t a big deal. But, what about ephemeral
nodes involved in a lock recipe? Remember, writes only go to a Quorum of servers.

Imagine this wholly contrived scenario:

1. Your clients execute a leader election recipe and a leader is selected
2. The leader writes a ZNode as a type of flag (note: in practice there are issues with this,
but don’t worry about it now)
3. The leader executes a REST call to a third party server that starts something important
that is not idempotent
4. The ensemble has a horrid crash at this point

Now, you want to restore from a backup. How old is the backup? What server did the backup
come from? There’s a very good chance that you restore from backup and the ZNode written
in step 2 is not in the transaction log you restored. Now, your leader is going to send that
REST call again. Even if the ZNode is recorded, old ephemerals may appear again and the leader
might think it’s leader a 2nd time. There are so many vagaries that it’s difficult to
reason about.

Again, this highly contrived but you can imagine many similar types of scenarios. ZK is a
coordinator, not a database.

-JZ

> On Feb 22, 2016, at 9:58 AM, AALISHE <aalishe@gmail.com> wrote:
> 
> Thanks Jordan... can you elaborate more on your answer
> On Feb 22, 2016 7:43 PM, "Jordan Zimmerman" <jordan@jordanzimmerman.com>
> wrote:
> 
>> Be careful when restoring that you don’t go “back in time”. ZooKeeper can
>> be used as a datastore (bad idea) and a coordinator. If the transactions
>> files you are restoring contain paths that are involved in leader
>> elections, etc. insanity can ensue.
>> 
>>> On Feb 22, 2016, at 9:05 AM, vikrant singh <vikrant.thakur@gmail.com>
>> wrote:
>>> 
>>> I think you need not to worry about the leader election and who was the
>>> previous leader. Quorum should be able to handle it when it comes up.
>>> Neither you need to validate who becomes new leader.
>>> 
>>> Before you delete any files, please make sure you keep the back up so if
>>> your experiment fails you do not end up with no files to try again.
>>> 
>>> That said.. once all dat is backed up I would go and delete all the
>>> snapshot.* and log.*  except latest one. In your case I will leave
>>> snapshot.d0002bf88
>>> in the data folder. Please note the number at the end of file.. it is the
>>> transaction number after which this snap shout was created. On each of
>> your
>>> ZK server you will have a file for which this number will be in the same
>>> range. Keep those file on the server.
>>> 
>>> I do not think you need to initialize any data manually.. once snapshot
>>> files are there in place you can start the  server and most likely it
>> will
>>> come up.
>>> 
>>> All the best.
>>> 
>>> 
>>> 
>>> 
>>> 
>>> On Mon, Feb 22, 2016 at 7:08 AM, AALISHE <aalishe@gmail.com> wrote:
>>> 
>>>> Hi Vikrant/All,
>>>> 
>>>> I have some thought about the steps to share:
>>>> 
>>>> 
>>>> 1- Since this is a 3 node cluster ....  I must Identify which one is
>> the
>>>> (leader ZK node)
>>>> 2- Stop ZK from cloudera manager
>>>> 3- Go to snapshot folder (on the leader) and take a backup a side
>>>> 4- delete the files (snapshot + log) with the newest date stamp?  (on
>> all 3
>>>> nodes)
>>>> 5-  Start ZK and make sure the previous leader is the current leader ?
>> or
>>>> maybe I should initialize ZK data ?
>>>> 
>>>> 
>>>> 
>>>> Can anyone take a look please and confirm/correct the above steps.
>>>> 
>>>> 
>>>> cheers!
>>>> 
>>>> On Mon, Feb 22, 2016 at 4:31 PM, vikrant singh <
>> vikrant.thakur@gmail.com>
>>>> wrote:
>>>> 
>>>>> I have not tried it, but as I understand following should be the steps
>> to
>>>>> follow.
>>>>> Step1 - back up these snapshot files
>>>>> Step2 - choose the snapshot files from which you want to recover.
>>>>> Step3 - remove all other files from data dir
>>>>> Step4 - Start server
>>>>> 
>>>>> On Mon, Feb 22, 2016 at 2:04 AM, AALISHE <aalishe@gmail.com> wrote:
>>>>> 
>>>>>> Anything anyone please?
>>>>>> On Feb 21, 2016 5:51 PM, "AALISHE" <aalishe@gmail.com> wrote:
>>>>>> 
>>>>>>> 
>>>>>>>> thanks Ted,
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> this is the link   http://pastebin.com/CgGi45EN
>>>>>>> 
>>>>>>> 
>>>>>>> cheers!
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>> 
>> 


Mime
View raw message