zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patrick Hunt <ph...@apache.org>
Subject Re: 15 minutes to sync?
Date Wed, 01 Aug 2012 00:13:42 GMT
Any monitoring of mem, gc, disk, etc... that might give some
additional insight? Perhaps the disks were loaded and that was slowing
things? Or swapping/gc of the jvm? You might be able to tune to
resolve some of that.

One thing you can try is copying the snapshot file to a an empty
datadir on a separate machine and try starting a 2 node cluster.
(where the second node starts with an empty datadir)

Patrick

On Tue, Jul 31, 2012 at 3:34 PM, Jordan Zimmerman
<jordan@jordanzimmerman.com> wrote:
>> Seems you are down to 4gb now. That still seems way too high for
>> "coordination" operations… ?
>
> A big problem currently is detritus nodes. People use lock recipes for various movie
IDs and they leave garbage parent nodes around in the thousands. I've written some gc tasks
to clean them up but it's been a slow process to get everyone to use it. I know there is a
Jira to help with this but I don't know the status.
>
> -JZ
>
> On Jul 31, 2012, at 3:17 PM, Patrick Hunt <phunt@apache.org> wrote:
>
>> On Tue, Jul 31, 2012 at 3:14 PM, Jordan Zimmerman
>> <jordan@jordanzimmerman.com> wrote:
>>> There were a lot creations but I removed those nodes last night. How long does
it take to clear out of the snapshot?
>>
>> The snapshot is a copy of whatever is in the znode tree at the time
>> the snapshot is taken. (so instantaneous the next time a snapshot is
>> taken). You can see the dates and the epoch number if that gives you
>> any insight (epoch is the upper 32 bits of the filename)
>>
>> Seems you are down to 4gb now. That still seems way too high for
>> "coordination" operations... ?
>>
>> Patrick
>>
>>>
>>> On Jul 31, 2012, at 2:52 PM, Patrick Hunt <phunt@apache.org> wrote:
>>>
>>>> You have an 11gig snapshot file. That's very large. Did someone
>>>> unexpectedly overload the server with znode creations?
>>>>
>>>> When a follower comes up the leader needs to serialize the znodes to
>>>> the snapshot file, stream it to the follower, who saves it locally
>>>> then deserializes it. (11g/15min is avg about 12meg/second for this
>>>> process)
>>>>
>>>> Often times this is exacerbated by the max heap and GC interactions.
>>>>
>>>> Patrick
>>>>
>>>> On Tue, Jul 31, 2012 at 2:23 PM, Jordan Zimmerman
>>>> <jordan@jordanzimmerman.com> wrote:
>>>>> BTW - this is 3.3.5
>>>>>
>>>>> On Jul 31, 2012, at 2:22 PM, Jordan Zimmerman <jordan@jordanzimmerman.com>
wrote:
>>>>>
>>>>>> We've had a few outages of our ZK cluster recently. When trying to
bring the cluster back up it's been taking 10-15 minutes for the followers to sync with the
Leader. Any idea what might cause this? Here's an ls of the data dir:
>>>>>>
>>>>>> -rw-r--r-- 1 zookeeperserverprod nac    67108880 Jul 31 20:39 log.3900a4bc75
>>>>>> -rw-r--r-- 1 zookeeperserverprod nac    67108880 Jul 31 20:40 log.3900a634ee
>>>>>> -rw-r--r-- 1 zookeeperserverprod nac    67108880 Jul 31 21:21 log.3a00000001
>>>>>> -rw-r--r-- 1 zookeeperserverprod nac    67108880 Jul 31 21:22 log.3a000139a2
>>>>>> -rw-r--r-- 1 zookeeperserverprod nac  9279729723 Jul 31 20:42 snapshot.3900a634ec
>>>>>> -rw-r--r-- 1 zookeeperserverprod nac 11126306780 Jul 31 21:09 snapshot.3900a6b149
>>>>>> -rw-r--r-- 1 zookeeperserverprod nac  4153727423 Jul 31 21:22 snapshot.3a000139a0
>>>>>>
>>>>>
>>>
>

Mime
View raw message