Return-Path: Delivered-To: apmail-lucene-hadoop-dev-archive@locus.apache.org Received: (qmail 29495 invoked from network); 19 Oct 2007 16:53:13 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 19 Oct 2007 16:53:13 -0000 Received: (qmail 24904 invoked by uid 500); 19 Oct 2007 16:53:00 -0000 Delivered-To: apmail-lucene-hadoop-dev-archive@lucene.apache.org Received: (qmail 24542 invoked by uid 500); 19 Oct 2007 16:52:59 -0000 Mailing-List: contact hadoop-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hadoop-dev@lucene.apache.org Delivered-To: mailing list hadoop-dev@lucene.apache.org Received: (qmail 24533 invoked by uid 99); 19 Oct 2007 16:52:59 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 19 Oct 2007 09:52:59 -0700 X-ASF-Spam-Status: No, hits=-100.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.4] (HELO brutus.apache.org) (140.211.11.4) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 19 Oct 2007 16:53:11 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 5E07871423B for ; Fri, 19 Oct 2007 09:52:51 -0700 (PDT) Message-ID: <3233913.1192812771382.JavaMail.jira@brutus> Date: Fri, 19 Oct 2007 09:52:51 -0700 (PDT) From: "Jim Kellerman (JIRA)" To: hadoop-dev@lucene.apache.org Subject: [jira] Created: (HADOOP-2079) [hbase] HLog generates incorrect file name when splitting a log, race condition also contributes MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [hbase] HLog generates incorrect file name when splitting a log, race condition also contributes ------------------------------------------------------------------------------------------------- Key: HADOOP-2079 URL: https://issues.apache.org/jira/browse/HADOOP-2079 Project: Hadoop Issue Type: Bug Components: contrib/hbase Affects Versions: 0.16.0 Reporter: Jim Kellerman Assignee: Jim Kellerman Fix For: 0.16.0 In Hadoop-Nightly #277 TestRegionServerExit failed with a timeout. The reason for this was a race in the Master in which checkAssigned (run from either the root or meta scanner) will immediately try to split the log and then assign a region which has invalid server info. The scenario went something like this: 1. region server aborted 2. root region was written on optional cache flush lease timed out on aborted server which removes it from serversToServerInfo and queues a PendingServerShutdown operation 3. root scanner runs and finds server info incorrect (it is in the root region but the server is not in serversToServerInfo 4. checkAssigned starts splitting the log but because the log name is incorrect it can't finish 5. PendingServerShutdown fires and really gums up the works. So there are two problems: 1. HLog.splitLog needs to generate the correct log file name. 2. PendingServerShutdown and/or leaseExpired need to cooperate with checkAssigned so that there are not two concurrent attempts to recover the log. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.