zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Fournier, Camille F. [Tech]" <Camille.Fourn...@gs.com>
Subject RE: lost ZK events across datacenters
Date Mon, 06 Jun 2011 19:13:20 GMT
Hey Jun, question: What version of Java are your clients running? I keep hitting a bug in my
java5 test suite and I'm wondering if in fact I am seeing the same problem you're reporting
here. 

C

-----Original Message-----
From: Jun Rao [mailto:junrao@gmail.com] 
Sent: Friday, June 03, 2011 12:59 PM
To: user@zookeeper.apache.org
Subject: Re: lost ZK events across datacenters

I don't expect that we can discover the problem right now. However, what are
the things that I can do to collect enough tracing should the problem occur
again in the future (e.g., is INFO level logging enough)?

Thanks,

Jun

On Fri, Jun 3, 2011 at 9:56 AM, Jun Rao <junrao@gmail.com> wrote:

> The log doesn't have any state changing entries around the time the watcher
> is triggered, in all clients.
>
> Jun
>
>
> On Fri, Jun 3, 2011 at 9:32 AM, Fournier, Camille F. [Tech] <
> Camille.Fournier@gs.com> wrote:
>
>> Any state changes for the problem client between setting the watch and
>> when you expected it to get called? Do you have logs for that client vs the
>> others that show anything?
>>
>> -----Original Message-----
>> From: Jun Rao [mailto:junrao@gmail.com]
>> Sent: Friday, June 03, 2011 4:40 AM
>> To: user@zookeeper.apache.org
>> Subject: Re: lost ZK events across datacenters
>>
>> Ben,
>>
>> Some details below.
>>
>> The call that sets the watcher simple calls getChildren with watcher flag
>> set to true. The triggering change is that one of the child nodes (which
>> is
>> ephemeral) is deleted because the creating client is gone.
>>
>> Thanks,
>>
>> Jun
>>
>> On Thu, Jun 2, 2011 at 10:49 AM, Benjamin Reed <breed@apache.org> wrote:
>>
>> > can you tell us a bit more about the scenario? what was the call the
>> > set the watch event? and what were the changes that caused the event?
>> >
>> > thanx
>> > ben
>> >
>> > On Wed, Jun 1, 2011 at 3:14 PM, Jun Rao <junrao@gmail.com> wrote:
>> > > All my clients were on different machines. 2 of them got the watcher
>> > fired
>> > > about the same time. The third one never got the watcher triggered.
>> > >
>> > > Thanks,
>> > >
>> > > Jun
>> > >
>> > > On Wed, Jun 1, 2011 at 2:18 PM, Fournier, Camille F. [Tech] <
>> > > Camille.Fournier@gs.com> wrote:
>> > >
>> > >> All clients are in different processes?
>> > >> I've used zkclient and haven't seen any problems, but I haven't
>> hammered
>> > it
>> > >> too hard yet. I took a long look at the code and didn't see any
>> errors
>> > but
>> > >> there could always be something very subtle.
>> > >>
>> > >> -----Original Message-----
>> > >> From: Jun Rao [mailto:junrao@gmail.com]
>> > >> Sent: Wednesday, June 01, 2011 4:09 PM
>> > >> To: user@zookeeper.apache.org
>> > >> Subject: Re: lost ZK events across datacenters
>> > >>
>> > >> I am using the zkclient package (
>> > >> https://github.com/sgroschupf/zkclient.git).
>> > >> The watcher code seems reasonable. Basically, each watcher event is
>> > first
>> > >> added to a queue. A separate event thread dequeues each event and
>> reads
>> > the
>> > >> children of a path (which re-registers the watcher) and invokes the
>> > >> registered listener.
>> > >>
>> > >> Anybody knows any issues in zkclient?
>> > >>
>> > >> Thanks,
>> > >>
>> > >> Jun
>> > >>
>> > >> On Wed, Jun 1, 2011 at 12:04 PM, Ted Dunning <ted.dunning@gmail.com>
>> > >> wrote:
>> > >>
>> > >> > This is most commonly due, in my own history of programming errors,
>> to
>> > >> > writing code that has a race window in it.  It is conceivable
that
>> > cross
>> > >> > data-center operation would make such a race more of a problem.
>> > >> >
>> > >> > Can you say a bit about your code?  Did you make sure to use
>> standard
>> > >> > idioms
>> > >> > as opposed to setting the watch in a different call from reading
>> the
>> > >> data?
>> > >> >
>> > >> > On Wed, Jun 1, 2011 at 11:40 AM, Jun Rao <junrao@gmail.com>
wrote:
>> > >> >
>> > >> > > Hi,
>> > >> > >
>> > >> > > I have a setup where multiple ZK clients are sitting in a
>> different
>> > >> > > datacenter from the ZK server. All clients registered the
same
>> child
>> > >> > > watcher
>> > >> > > on a path. However, when the children of the path changed,
the
>> > watcher
>> > >> on
>> > >> > 1
>> > >> > > of the clients didn't fire. This seems to have happened a
couple
>> of
>> > >> times
>> > >> > > to
>> > >> > > me. I am using ZK 3.3.3. Has anyone used ZK in a cross datacenter
>> > setup
>> > >> > and
>> > >> > > seen problems like that before?
>> > >> > >
>> > >> > > Thanks,
>> > >> > >
>> > >> > > Jun
>> > >> > >
>> > >> >
>> > >>
>> > >
>> >
>>
>
>

Mime
View raw message