Return-Path: Delivered-To: apmail-hadoop-zookeeper-user-archive@minotaur.apache.org Received: (qmail 39728 invoked from network); 4 May 2010 21:44:42 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 4 May 2010 21:44:42 -0000 Received: (qmail 32546 invoked by uid 500); 4 May 2010 21:44:42 -0000 Delivered-To: apmail-hadoop-zookeeper-user-archive@hadoop.apache.org Received: (qmail 32526 invoked by uid 500); 4 May 2010 21:44:42 -0000 Mailing-List: contact zookeeper-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: zookeeper-user@hadoop.apache.org Delivered-To: mailing list zookeeper-user@hadoop.apache.org Received: (qmail 32518 invoked by uid 99); 4 May 2010 21:44:42 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 04 May 2010 21:44:42 +0000 X-ASF-Spam-Status: No, hits=0.1 required=10.0 tests=AWL,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [69.147.107.21] (HELO mrout2-b.corp.re1.yahoo.com) (69.147.107.21) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 04 May 2010 21:44:35 +0000 Received: from SNV-EXBH01.ds.corp.yahoo.com (snv-exbh01.ds.corp.yahoo.com [207.126.227.249]) by mrout2-b.corp.re1.yahoo.com (8.13.8/8.13.8/y.out) with ESMTP id o44LgHHl042497; Tue, 4 May 2010 14:42:22 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; s=serpent; d=yahoo-inc.com; c=nofws; q=dns; h=received:user-agent:date:subject:from:to:cc:message-id: thread-topic:thread-index:in-reply-to:mime-version:content-type: content-transfer-encoding:x-originalarrivaltime; b=bPNkKNY1q2EuVz6LZmOMZxq8ZFupHEqn2pju9OfKcc7MeIeClJ25FqOaaSvUgtUm Received: from SNV-EXVS09.ds.corp.yahoo.com ([207.126.227.86]) by SNV-EXBH01.ds.corp.yahoo.com with Microsoft SMTPSVC(6.0.3790.4675); Tue, 4 May 2010 14:41:47 -0700 Received: from 10.73.146.106 ([10.73.146.106]) by SNV-EXVS09.ds.corp.yahoo.com ([207.126.227.84]) via Exchange Front-End Server snv-webmail.corp.yahoo.com ([207.126.227.59]) with Microsoft Exchange Server HTTP-DAV ; Tue, 4 May 2010 21:41:00 +0000 User-Agent: Microsoft-Entourage/12.24.0.100205 Date: Tue, 04 May 2010 14:41:00 -0700 Subject: Re: avoiding deadlocks on client handle close w/ python/c api From: Mahadev Konar To: CC: Kapil Thangavelu Message-ID: Thread-Topic: avoiding deadlocks on client handle close w/ python/c api Thread-Index: Acrr0nyfJkFFJ1JowEOmjhG1OKMYJg== In-Reply-To: <4BE09261.5040401@apache.org> Mime-version: 1.0 Content-type: text/plain; charset="US-ASCII" Content-transfer-encoding: 7bit X-OriginalArrivalTime: 04 May 2010 21:41:47.0125 (UTC) FILETIME=[98B6B250:01CAEBD2] Sure, Ill take a look at it. Thanks mahadev On 5/4/10 2:32 PM, "Patrick Hunt" wrote: > Thanks Kapil, Mahadev perhaps you could take a look at this as well? > > Patrick > > On 05/04/2010 06:36 AM, Kapil Thangavelu wrote: >> I've constructed a simple example just using the zkpython library with >> condition variables, that will deadlock. I've filed a new ticket for it, >> >> https://issues.apache.org/jira/browse/ZOOKEEPER-763 >> >> the gdb stack traces look suspiciously like the ones in 591, but sans the >> watchers. >> https://issues.apache.org/jira/browse/ZOOKEEPER-591 >> >> the attached example on the ticket will deadlock in zk 3.3.0 (which has the >> fix for 591) and trunk. >> >> -kapil >> >> On Mon, May 3, 2010 at 9:48 PM, Kapil Thangaveluwrote: >> >>> Hi Folks, >>> >>> I'm constructing an async api on top of the zookeeper python bindings for >>> twisted. The intent was to make a thin wrapper that would wrap the existing >>> async api with one that allows for integration with the twisted python event >>> loop (http://www.twistedmatrix.com) primarily using the async apis. >>> >>> One issue i'm running into while developing a unit tests, deadlocks occur >>> if we attempt to close a handle while there are any outstanding async >>> requests (aget, acreate, etc). Normally on close both the io thread >>> terminates and the completion thread are terminated and joined, however >>> w\ith outstanding async requests, the completion thread won't be in a >>> joinable state, and we effectively hang when the main thread does the join. >>> >>> I'm curious if this would be considered bug, afaics ideal behavior would be >>> on close of a handle, to effectively clear out any remaining callbacks and >>> let the completion thread terminate. >>> >>> i've tried adding some bookkeeping to the api to guard against closing >>> while there is an outstanding completion request, but its an imperfect >>> solution do to the nature of the event loop integration. The problem is that >>> the python callback invoked by the completion thread in turn schedules a >>> function for the main thread. In twisted the api for this is implemented by >>> appending the function to a list attribute on the reactor and then writing a >>> byte to a pipe to wakeup the main thread. If a thread switch to the main >>> thread occurs before the completion thread callback returns, the scheduled >>> function runs and the rest of the application keeps processing, of which the >>> last step for the unit tests is to close the connection, which results in a >>> deadlock. >>> >>> i've included some of the client log and gdb stack traces from a deadlock'd >>> client process. >>> >>> thanks, >>> >>> Kapil >>> >>> >>> >>> >>