Return-Path: Delivered-To: apmail-zookeeper-user-archive@www.apache.org Received: (qmail 13007 invoked from network); 29 Jan 2011 19:23:53 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 29 Jan 2011 19:23:53 -0000 Received: (qmail 96530 invoked by uid 500); 29 Jan 2011 19:23:53 -0000 Delivered-To: apmail-zookeeper-user-archive@zookeeper.apache.org Received: (qmail 96452 invoked by uid 500); 29 Jan 2011 19:23:52 -0000 Mailing-List: contact user-help@zookeeper.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@zookeeper.apache.org Delivered-To: mailing list user@zookeeper.apache.org Received: (qmail 96444 invoked by uid 99); 29 Jan 2011 19:23:52 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 29 Jan 2011 19:23:52 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW X-Spam-Check-By: apache.org Received-SPF: unknown (nike.apache.org: error in processing during lookup of strib@nicira.com) Received: from [209.85.160.170] (HELO mail-gy0-f170.google.com) (209.85.160.170) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 29 Jan 2011 19:23:43 +0000 Received: by gyf2 with SMTP id 2so2352483gyf.15 for ; Sat, 29 Jan 2011 11:23:22 -0800 (PST) Received: by 10.90.119.3 with SMTP id r3mr6710496agc.29.1296329002157; Sat, 29 Jan 2011 11:23:22 -0800 (PST) Received: from [192.168.1.5] (204-195-74-139.wavecable.com [204.195.74.139]) by mx.google.com with ESMTPS id b27sm23462056ana.8.2011.01.29.11.23.20 (version=SSLv3 cipher=RC4-MD5); Sat, 29 Jan 2011 11:23:21 -0800 (PST) Message-ID: <4D446926.4090602@nicira.com> Date: Sat, 29 Jan 2011 11:23:18 -0800 From: Jeremy Stribling User-Agent: Thunderbird 2.0.0.24 (X11/20100623) MIME-Version: 1.0 To: user@zookeeper.apache.org Subject: hang in zookeeper_close() in the mt C client Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org Hi everyone, I use the multithreaded ZK C client library (3.3.2), and I'm seeing my application hang, and the only thread in it that's doing anything interesting is this one: Thread 8 (Thread 5644): #0 0x00007f5d7bb5bbe4 in __lll_lock_wait () from /lib/libpthread.so.0 #1 0x00007f5d7bb59ad0 in pthread_cond_broadcast@@GLIBC_2.3.2 () from /lib/libpthread.so.0 #2 0x00007f5d793628f6 in unlock_completion_list (l=0x32b4d68) at .../zookeeper/src/c/src/mt_adaptor.c:66 #3 0x00007f5d79354d4b in free_completions (zh=0x32b4c80, callCompletion=1, reason=-116) at .../zookeeper/src/c/src/zookeeper.c:1069 #4 0x00007f5d79355008 in cleanup_bufs (zh=0x32b4c80, callCompletion=1, rc=-116) at .../thirdparty/zookeeper/src/c/src/zookeeper.c:1125 #5 0x00007f5d79353200 in destroy (zh=0x32b4c80) at .../thirdparty/zookeeper/src/c/src/zookeeper.c:366 #6 0x00007f5d79358e0e in zookeeper_close (zh=0x32b4c80) at .../zookeeper/src/c/src/zookeeper.c:2326 #7 0x00007f5d79356d18 in api_epilog (zh=0x32b4c80, rc=0) at .../zookeeper/src/c/src/zookeeper.c:1661 #8 0x00007f5d79362f2f in adaptor_finish (zh=0x32b4c80) at .../zookeeper/src/c/src/mt_adaptor.c:205 #9 0x00007f5d79358c8c in zookeeper_close (zh=0x32b4c80) at .../zookeeper/src/c/src/zookeeper.c:2297 .... I've seen some threads online about how there's a race condition associated with zookeeper_close, where if you app is making a synchronous call at the same time using the closed zk_handle, there could be a hang. However, my app makes no synchronous calls, and I'm 99% sure that no other thread in my app is making any concurrent call into the library ('thread apply all bt' in gdb doesn't show any other usage of the library, anyway). Has anyone seen this before? Any leads? Thanks, Jeremy