zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Nickerson <davidnickerson4mailingli...@gmail.com>
Subject Re: 15 minutes to sync?
Date Thu, 02 Aug 2012 23:57:40 GMT
>
> People use lock recipes for various movie IDs and they leave garbage
> parent nodes around in the thousands.


I came across this problem too. You probably have a fine solution, but I
solved it by asynchronously attempting to delete the parent node every time
a resource is unlocked. If the parent still has children, it can't be
deleted. The only catch is that the parent's existence won't be reliable
when a resource is trying to be locked. As long as you account for that,
there's no problem.

On Tue, Jul 31, 2012 at 6:34 PM, Jordan Zimmerman <
jordan@jordanzimmerman.com> wrote:

> > Seems you are down to 4gb now. That still seems way too high for
> > "coordination" operations… ?
>
> A big problem currently is detritus nodes. People use lock recipes for
> various movie IDs and they leave garbage parent nodes around in the
> thousands. I've written some gc tasks to clean them up but it's been a slow
> process to get everyone to use it. I know there is a Jira to help with this
> but I don't know the status.
>
> -JZ
>
> On Jul 31, 2012, at 3:17 PM, Patrick Hunt <phunt@apache.org> wrote:
>
> > On Tue, Jul 31, 2012 at 3:14 PM, Jordan Zimmerman
> > <jordan@jordanzimmerman.com> wrote:
> >> There were a lot creations but I removed those nodes last night. How
> long does it take to clear out of the snapshot?
> >
> > The snapshot is a copy of whatever is in the znode tree at the time
> > the snapshot is taken. (so instantaneous the next time a snapshot is
> > taken). You can see the dates and the epoch number if that gives you
> > any insight (epoch is the upper 32 bits of the filename)
> >
> > Seems you are down to 4gb now. That still seems way too high for
> > "coordination" operations... ?
> >
> > Patrick
> >
> >>
> >> On Jul 31, 2012, at 2:52 PM, Patrick Hunt <phunt@apache.org> wrote:
> >>
> >>> You have an 11gig snapshot file. That's very large. Did someone
> >>> unexpectedly overload the server with znode creations?
> >>>
> >>> When a follower comes up the leader needs to serialize the znodes to
> >>> the snapshot file, stream it to the follower, who saves it locally
> >>> then deserializes it. (11g/15min is avg about 12meg/second for this
> >>> process)
> >>>
> >>> Often times this is exacerbated by the max heap and GC interactions.
> >>>
> >>> Patrick
> >>>
> >>> On Tue, Jul 31, 2012 at 2:23 PM, Jordan Zimmerman
> >>> <jordan@jordanzimmerman.com> wrote:
> >>>> BTW - this is 3.3.5
> >>>>
> >>>> On Jul 31, 2012, at 2:22 PM, Jordan Zimmerman <
> jordan@jordanzimmerman.com> wrote:
> >>>>
> >>>>> We've had a few outages of our ZK cluster recently. When trying
to
> bring the cluster back up it's been taking 10-15 minutes for the followers
> to sync with the Leader. Any idea what might cause this? Here's an ls of
> the data dir:
> >>>>>
> >>>>> -rw-r--r-- 1 zookeeperserverprod nac    67108880 Jul 31 20:39
> log.3900a4bc75
> >>>>> -rw-r--r-- 1 zookeeperserverprod nac    67108880 Jul 31 20:40
> log.3900a634ee
> >>>>> -rw-r--r-- 1 zookeeperserverprod nac    67108880 Jul 31 21:21
> log.3a00000001
> >>>>> -rw-r--r-- 1 zookeeperserverprod nac    67108880 Jul 31 21:22
> log.3a000139a2
> >>>>> -rw-r--r-- 1 zookeeperserverprod nac  9279729723 Jul 31 20:42
> snapshot.3900a634ec
> >>>>> -rw-r--r-- 1 zookeeperserverprod nac 11126306780 Jul 31 21:09
> snapshot.3900a6b149
> >>>>> -rw-r--r-- 1 zookeeperserverprod nac  4153727423 Jul 31 21:22
> snapshot.3a000139a0
> >>>>>
> >>>>
> >>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message