zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benjamin Reed <br...@apache.org>
Subject Re: lost ZK events across datacenters
Date Fri, 03 Jun 2011 17:18:23 GMT
actually, i think the transaction log could help a lot, and that will
always be there. two scenarios i can think of are:
1) the change happened before the watch was set
2) the change never got there
you could get an answer to both of those questions by looking at the
transaction log.

ben

On Fri, Jun 3, 2011 at 9:59 AM, Jun Rao <junrao@gmail.com> wrote:
> I don't expect that we can discover the problem right now. However, what are
> the things that I can do to collect enough tracing should the problem occur
> again in the future (e.g., is INFO level logging enough)?
>
> Thanks,
>
> Jun
>
> On Fri, Jun 3, 2011 at 9:56 AM, Jun Rao <junrao@gmail.com> wrote:
>
>> The log doesn't have any state changing entries around the time the watcher
>> is triggered, in all clients.
>>
>> Jun
>>
>>
>> On Fri, Jun 3, 2011 at 9:32 AM, Fournier, Camille F. [Tech] <
>> Camille.Fournier@gs.com> wrote:
>>
>>> Any state changes for the problem client between setting the watch and
>>> when you expected it to get called? Do you have logs for that client vs the
>>> others that show anything?
>>>
>>> -----Original Message-----
>>> From: Jun Rao [mailto:junrao@gmail.com]
>>> Sent: Friday, June 03, 2011 4:40 AM
>>> To: user@zookeeper.apache.org
>>> Subject: Re: lost ZK events across datacenters
>>>
>>> Ben,
>>>
>>> Some details below.
>>>
>>> The call that sets the watcher simple calls getChildren with watcher flag
>>> set to true. The triggering change is that one of the child nodes (which
>>> is
>>> ephemeral) is deleted because the creating client is gone.
>>>
>>> Thanks,
>>>
>>> Jun
>>>
>>> On Thu, Jun 2, 2011 at 10:49 AM, Benjamin Reed <breed@apache.org> wrote:
>>>
>>> > can you tell us a bit more about the scenario? what was the call the
>>> > set the watch event? and what were the changes that caused the event?
>>> >
>>> > thanx
>>> > ben
>>> >
>>> > On Wed, Jun 1, 2011 at 3:14 PM, Jun Rao <junrao@gmail.com> wrote:
>>> > > All my clients were on different machines. 2 of them got the watcher
>>> > fired
>>> > > about the same time. The third one never got the watcher triggered.
>>> > >
>>> > > Thanks,
>>> > >
>>> > > Jun
>>> > >
>>> > > On Wed, Jun 1, 2011 at 2:18 PM, Fournier, Camille F. [Tech] <
>>> > > Camille.Fournier@gs.com> wrote:
>>> > >
>>> > >> All clients are in different processes?
>>> > >> I've used zkclient and haven't seen any problems, but I haven't
>>> hammered
>>> > it
>>> > >> too hard yet. I took a long look at the code and didn't see any
>>> errors
>>> > but
>>> > >> there could always be something very subtle.
>>> > >>
>>> > >> -----Original Message-----
>>> > >> From: Jun Rao [mailto:junrao@gmail.com]
>>> > >> Sent: Wednesday, June 01, 2011 4:09 PM
>>> > >> To: user@zookeeper.apache.org
>>> > >> Subject: Re: lost ZK events across datacenters
>>> > >>
>>> > >> I am using the zkclient package (
>>> > >> https://github.com/sgroschupf/zkclient.git).
>>> > >> The watcher code seems reasonable. Basically, each watcher event
is
>>> > first
>>> > >> added to a queue. A separate event thread dequeues each event and
>>> reads
>>> > the
>>> > >> children of a path (which re-registers the watcher) and invokes
the
>>> > >> registered listener.
>>> > >>
>>> > >> Anybody knows any issues in zkclient?
>>> > >>
>>> > >> Thanks,
>>> > >>
>>> > >> Jun
>>> > >>
>>> > >> On Wed, Jun 1, 2011 at 12:04 PM, Ted Dunning <ted.dunning@gmail.com>
>>> > >> wrote:
>>> > >>
>>> > >> > This is most commonly due, in my own history of programming
errors,
>>> to
>>> > >> > writing code that has a race window in it.  It is conceivable
that
>>> > cross
>>> > >> > data-center operation would make such a race more of a problem.
>>> > >> >
>>> > >> > Can you say a bit about your code?  Did you make sure to
use
>>> standard
>>> > >> > idioms
>>> > >> > as opposed to setting the watch in a different call from reading
>>> the
>>> > >> data?
>>> > >> >
>>> > >> > On Wed, Jun 1, 2011 at 11:40 AM, Jun Rao <junrao@gmail.com>
wrote:
>>> > >> >
>>> > >> > > Hi,
>>> > >> > >
>>> > >> > > I have a setup where multiple ZK clients are sitting
in a
>>> different
>>> > >> > > datacenter from the ZK server. All clients registered
the same
>>> child
>>> > >> > > watcher
>>> > >> > > on a path. However, when the children of the path changed,
the
>>> > watcher
>>> > >> on
>>> > >> > 1
>>> > >> > > of the clients didn't fire. This seems to have happened
a couple
>>> of
>>> > >> times
>>> > >> > > to
>>> > >> > > me. I am using ZK 3.3.3. Has anyone used ZK in a cross
datacenter
>>> > setup
>>> > >> > and
>>> > >> > > seen problems like that before?
>>> > >> > >
>>> > >> > > Thanks,
>>> > >> > >
>>> > >> > > Jun
>>> > >> > >
>>> > >> >
>>> > >>
>>> > >
>>> >
>>>
>>
>>
>

Mime
View raw message