Return-Path: Delivered-To: apmail-lucene-hadoop-user-archive@locus.apache.org Received: (qmail 34345 invoked from network); 1 Jun 2007 16:40:21 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 1 Jun 2007 16:40:21 -0000 Received: (qmail 87543 invoked by uid 500); 1 Jun 2007 16:40:13 -0000 Delivered-To: apmail-lucene-hadoop-user-archive@lucene.apache.org Received: (qmail 87522 invoked by uid 500); 1 Jun 2007 16:40:13 -0000 Mailing-List: contact hadoop-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hadoop-user@lucene.apache.org Delivered-To: mailing list hadoop-user@lucene.apache.org Received: (qmail 87512 invoked by uid 99); 1 Jun 2007 16:40:13 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 01 Jun 2007 09:40:13 -0700 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received-SPF: neutral (herse.apache.org: local policy) Received: from [216.148.227.154] (HELO rwcrmhc14.comcast.net) (216.148.227.154) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 01 Jun 2007 09:40:08 -0700 Received: from [192.168.168.15] (c-71-202-24-246.hsd1.ca.comcast.net[71.202.24.246]) by comcast.net (rwcrmhc14) with ESMTP id <20070601163947m1400h406ee>; Fri, 1 Jun 2007 16:39:47 +0000 Message-ID: <46604BD2.9060809@apache.org> Date: Fri, 01 Jun 2007 09:39:46 -0700 From: Doug Cutting User-Agent: Thunderbird 1.5.0.10 (X11/20070403) MIME-Version: 1.0 To: hadoop-user@lucene.apache.org Subject: Re: Bad concurrency bug in 0.12.3? References: In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org Calvin Yu wrote: > The problem seems to be with the MapTask's (MapTask.java) sort > progress thread (line #196) not stopping after the sort is completed, > and hence the call to join() (line# 190) never returns. This is > because that thread is only catching the InterruptedException, and not > checking the thread's interrupted flag as well. According to the > Javadocs, an InterruptedException is thrown only if the Thread is in > the middle of the sleep(), wait(), join(), etc. calls, and during > normal operations only the interrupted flag is set. I think that, if a thread is interrupted, and its interrupt flag is set, and sleep() is called, then sleep() should immediately throw an InterruptedException. That's what the javadoc implies to me: Throws: InterruptedException - if another thread has interrupted the current thread. So this could be a JVM bug, or perhaps that's not the contract. I think we should fix this as a part of HADOOP-1431. We should change that to use the mechanism we use elsewhere. We should have a 'running' flag that's checked in the thread's main loop, and method to stop the thread that sets this flag to false and interrupts it. That works reliably in many places. Perhaps for the 0.14 release this logic should be abstracted into a base class for Daemon threads, so that we don't re-invent it everywhere. Doug