Return-Path: X-Original-To: apmail-zookeeper-user-archive@www.apache.org Delivered-To: apmail-zookeeper-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 1EF25CD88 for ; Wed, 9 May 2012 20:28:27 +0000 (UTC) Received: (qmail 83422 invoked by uid 500); 9 May 2012 20:28:26 -0000 Delivered-To: apmail-zookeeper-user-archive@zookeeper.apache.org Received: (qmail 83353 invoked by uid 500); 9 May 2012 20:28:26 -0000 Mailing-List: contact user-help@zookeeper.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@zookeeper.apache.org Delivered-To: mailing list user@zookeeper.apache.org Received: (qmail 83035 invoked by uid 99); 9 May 2012 20:28:25 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 09 May 2012 20:28:25 +0000 X-ASF-Spam-Status: No, hits=-2.3 required=5.0 tests=RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of jzimmerman@netflix.com designates 69.53.237.163 as permitted sender) Received: from [69.53.237.163] (HELO exout102.netflix.com) (69.53.237.163) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 09 May 2012 20:28:12 +0000 DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; s=s1024;d=netflix.com; h=from:to:subject:date:message-id:in-reply-to:content-type:mime-version; bh=i3o9zwxNgCwMspmM7jB+Lfaglg8=; b=nCLS+XfetA570JYMJspjlpKrhHS6AFHOWSiVqvYMhs8QgOhkpVZc1ibTvjHr/gdDA3sYCyzB z6Pg0AVREUUZ08PF3AG9n5ika7WYm5ZXLBSC0C63zTxO0tVXiK8ZxBi3NqgswpO+SPKg1Nlb usa2ByEegYj2Qf7q4w6B7M06E0g= DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024;d=netflix.com; h=from:to:subject:date:message-id:in-reply-to:content-type:mime-version; b=XbKVx+cTYpfgOqg72dAVQNwTFDc2la3eauZJX9QM8b3Em0pLEpvgsUSlMN2oLfpVWq9y+uXW tTQek/gXmZuXTHC4Mm5nADV5zTxADH0ygOsvYyeCs5mEGcmCxEjXMC4aoOu1GbqIFtgWxk3B Ommg8sQlj8nKaK9lGxloLNEo2XM= Received: from EXFE104.corp.netflix.com (10.64.32.104) by exout102.netflix.com (10.64.240.74) with Microsoft SMTP Server (TLS) id 8.3.245.1; Wed, 9 May 2012 13:27:11 -0700 Received: from EXMB107.corp.netflix.com ([169.254.7.134]) by exfe104.corp.netflix.com ([10.64.32.104]) with mapi id 14.02.0283.003; Wed, 9 May 2012 13:27:51 -0700 From: Jordan Zimmerman To: "user@zookeeper.apache.org" Subject: Re: Watch not sent immediately? Thread-Topic: Watch not sent immediately? Thread-Index: AQHNKeYJUL9McHmC7EOJa/3A1y+FLpbCZLiA//+L4QA= Date: Wed, 9 May 2012 20:27:50 +0000 Message-ID: In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: user-agent: Microsoft-MacOutlook/14.10.0.110310 x-originating-ip: [10.64.24.147] Content-Type: text/plain; charset="us-ascii" Content-ID: <4F215487A5CE9147BA4DE31399852A90@netflix.com> Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Virus-Checked: Checked by ClamAV on apache.org Interesting - this issue has come up several times with Curator users. I ended up writing a Tech Note on it. https://github.com/Netflix/curator/wiki/Tech-Note-1 -JZ On 5/9/12 1:23 PM, "Patrick Hunt" wrote: >I believe the issue is that there is a single thread updating >watchers. If you block that thread then the event can't be delivered. > >Patrick > >On Fri, May 4, 2012 at 4:06 AM, guru singh wrote: >> Hi, >> >> Sorry if the subject is not appropriately titled. >> >> I'm trying to implement a redis-failover solution using zookeeper. >> I've been working with the python binding for zk >> Basically, I have a znode called /master, a watch is set on this so >> that, whenever master changes, self.master is upated >> There is another znode called /errors, a watch is set on this via >> get_children to errors_watcher function. >> My code is supposed to continuously loop and create a childe znode on >> /errors, whenever an error is detected. >> The function errors_watcher, counts the number of children for znode >> /errors, if it exceeds a certain length, it writes a new master >> 'ip:port' to the znode /master, this calls the master watcher and >> updates self.master. I use python's threading.Condition() to block for >> certain operations, for instance initially when znode /master is >> created, I wait() for master_watcher to be called which updates >> self.master and releases the lock. This works as expected, however the >> problem is that when znode /master is changed from within >> errors_watcher, if I wait() for master_watcher to be called, updating >> self.master and then releasing the lock. The code just keeps waiting, >> the master_watcher is never called. However, if I don't wait after >> setting znode /master from within errors_watcher, master_watcher is >> called and it updates self.master. >> >> It'll be really helpful if somebody could point out what's wrong? Is >> it zk or is my understanding of threading.Condition() incorrect? >> Or both :) >> Thanks for your help >> >> This code snippet below, simulates the problem. >> >> class ZKtest: >> >> def __init__(self,zk_server): >> zk.set_log_stream(open('zk.log','w')) >> self.master =3D None >> self.zk_server =3D zk_server >> self.connected =3D False >> self.conn_cv =3D threading.Condition() >> >> def global_watcher(self,handle,event,state,path): >> self.conn_cv.acquire() >> print 'global watcher called' >> self.connected =3D True >> self.conn_cv.notifyAll() >> self.conn_cv.release() >> >> def master_watcher(self,handle,event,state,path): >> self.conn_cv.acquire() >> print 'master watcher called' >> master =3D zk.get(self.handle,path,self.master_watcher)[0] >> self.master =3D master >> print 'Master is %s' %(master) >> self.conn_cv.notifyAll() >> self.conn_cv.release() >> >> def errors_watcher(self,handle,event,state,path): >> self.conn_cv.acquire() >> print 'error watcher called' >> errors =3D=20 >>len(zk.get_children(self.handle,'/errors',self.errors_watcher)) >> print 'Current errors %d' %(errors) >> if errors > 5 : >> print 'Set new master, update znode /master' >> zk.set(self.handle,'/master','127.0.0.1:6380') >> #self.conn_cv.wait() <-- Why doesn't this return?? >> self.conn_cv.notifyAll() >> self.conn_cv.release() >> >> >> def create_znodes(self): >> self.conn_cv.acquire() >> master =3D zk.exists(self.handle,'/master',self.master_watcher) >> if not master: >> print 'Creating znode /master' >> zk.create(self.handle,'/master','127.0.0.1:6379', >> [ZOO_OPEN_ACL_UNSAFE]) >> else : >> print 'Updating znode /master' >> =20 >>zk.set(self.handle,'/master','127.0.0.1:6379',master['version']) >> self.conn_cv.wait() # wait until master_watcher has updated >> self.master, this returns after master_watcher is called >> print self.master # should not be None, since master_watcher >>updates it >> errors =3D zk.exists(self.handle,'/errors') >> if not errors: >> print 'Creating znode /errors' >> zk.create(self.handle,'/errors','Errors follow', >> [ZOO_OPEN_ACL_UNSAFE]) >> else : >> print 'Purge previous errors' >> for err in zk.get_children(self.handle,'/errors'): >> zk.delete(self.handle,'/errors/'+err) >> err =3D zk.get_children(self.handle,'/errors',self.errors_watcher= ) >> # set a watch for children of znode /errors >> self.conn_cv.release() >> >> >> def run(self): >> self.conn_cv.acquire() >> self.handle =3D zk.init(self.zk_server,self.global_watcher) >> if not self.connected: >> while not self.connected : >> print 'Not Connected, retry in 5' >> self.conn_cv.wait(5) >> self.handle =3D zk.init(self.zk_server) >> self.create_znodes() >> while self.master !=3D '127.0.0.1:6380': >> print 'Current Master %s' %(self.master) >> # simulate errors, until master is not 127.0.0.1:6380 >> =20 >>zk.create(self.handle,'/errors/','Error!',[ZOO_OPEN_ACL_UNSAFE], >> zk.SEQUENCE) >> self.conn_cv.wait() >> self.conn_cv.release() >> >> >> if __name__ =3D=3D '__main__' : >> zkt =3D ZKtest('127.0.0.1:2181') >> zkt.run() >