zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Camille Fournier <cami...@apache.org>
Subject Re: zk keeps disconnecting and reconnecting
Date Tue, 30 Aug 2011 23:50:39 GMT
If we think 3.4 is going to be stable I am cool with just doing a 3.4
release. If we are concerned with the stability of 3.4 though, we need
to go through the bugfixes that were put in and figure out what should
be in 3.3.4. A lot of them got put into both I think (1046 for
example).

C

On Tue, Aug 30, 2011 at 7:45 PM, Benjamin Reed <breed@apache.org> wrote:
> i have been wondering about 3.3.4. there are so many great bugs that
> were fixed in 3.4.0 that it isn't clear what we should put into 3.3.4
> or if we should even do it. the chroot bug does seem like a good one
> to do a 3.3.4 release for.
>
> ben
>
> On Mon, Aug 29, 2011 at 12:45 PM, Mahadev Konar <mahadev@hortonworks.com> wrote:
>> Camille,
>>  I will be cutting a branch this week some time. Just waiting for ZOOKEEPER-999
to get in. Other than that, we are probably  2 weeks away from the release.
>>  3.3.4 would be good even if we have 3.4 coming in a week or 2. Thats because 3.4.0
might take sometime to stabilize and 3.3.4 would be a good stable release (recommended for
production use), until 3.4 stabilizes.
>>  Does that sound reasonable? Others?
>>
>> thanks
>> mahadev
>>
>> On Aug 29, 2011, at 12:38 PM, Fournier, Camille F. wrote:
>>
>>> Yeah let's put it in 3.3.4. What's the plan for 3.4? I thought we were almost
ready for that.
>>>
>>> C
>>>
>>> -----Original Message-----
>>> From: Mahadev Konar [mailto:mahadev@hortonworks.com]
>>> Sent: Monday, August 29, 2011 2:10 PM
>>> To: user@zookeeper.apache.org
>>> Subject: Re: zk keeps disconnecting and reconnecting
>>>
>>> Camille,
>>> Do you think we should put the fix in 3.3.4? I think 3.4 might take a while to
stabilize, so 3.3.4 would be a good release to get this in.
>>>
>>> Thoughts?
>>>
>>> mahadev
>>>
>>> On Aug 29, 2011, at 10:50 AM, Fournier, Camille F. wrote:
>>>
>>>> Well, it causes the problem you are seeing. If you set any watchers with
a chroot and then your client gets disconnected with these watches outstanding, when you reconnect
you will try to reset them and they are probably on paths that don't exist (if you are creating
everything under path /kafka-tracking). So you get a notification about the watches immediately
after resetting them, which causes the string out of bounds exception.
>>>>
>>>> The only fix is to disable auto watch reset, and then have your own client
reset watches when it gets a reconnected event. I suspect it would be easier for you to take
a shot at fixing the bug than to rewrite your client to handle this. Thomas provided a patch
with tests that presumably show the error, so all you need is a fix to make them pass.
>>>>
>>>>
>>>> C
>>>>
>>>> -----Original Message-----
>>>> From: Jun Rao [mailto:junrao@gmail.com]
>>>> Sent: Monday, August 29, 2011 12:39 PM
>>>> To: user@zookeeper.apache.org; thomas@koch.ro
>>>> Subject: Re: zk keeps disconnecting and reconnecting
>>>>
>>>> What's the impact of ZOOKEEPER-961? If it shows up, does that mean the
>>>> client won't get any watcher events afterwards? If so, this sounds like a
>>>> blocker for 3.4 release to me. What's the temporary solution for 3.3.3?
>>>>
>>>> Also, for the very first time that the ZK client gets disconnected, I saw
>>>> the following entry in the log. It seems that the client can't ping the
>>>> server for 4 seconds. The ZK server was up at that time and the load was
>>>> minimal. What could cause the time out? Client GC pauses?
>>>>
>>>> 2011/08/26 10:58:22.306 INFO [ClientCnxn]
>>>> [main-SendThread(esv4-app27.stg:12913)] [kafka] Client session timed out,
>>>> have not heard from server in 4001ms for sessionid 0x131f
>>>> ddd84bc0006, closing socket connection and attempting reconnect
>>>>
>>>> Thanks,
>>>>
>>>> Jun
>>>>
>>>> On Mon, Aug 29, 2011 at 7:54 AM, Thomas Koch <thomas@koch.ro> wrote:
>>>>
>>>>> Fournier, Camille F.:
>>>>>> Did anyone ever check resetting watches at client reconnect on a
client
>>>>>> with a chroot? Looking at the code, we store the watches associated
with
>>>>>> the non-chroot path, but they are set by the original request prepending
>>>>>> chroot to the request. However, it looks like the SetWatches request
on
>>>>>> reconnect just calls get on the various watch lists from ZooKeeper,
which
>>>>>> don't have the prepended chroot.
>>>>>>
>>>>>> I haven't written a test but I would bet dollars to donuts this is
the
>>>>>> problem.
>>>>>>
>>>>>> C
>>>>> seems to be this:
>>>>> ZOOKEEPER-961, ZOOKEEPER-1091
>>>>>
>>>>> Regards,
>>>>>
>>>>> Thomas Koch, http://www.koch.ro
>>>>>
>>>
>>
>>
>

Mime
View raw message