Return-Path: Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: (qmail 27045 invoked from network); 23 Nov 2010 20:12:06 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 23 Nov 2010 20:12:06 -0000 Received: (qmail 30431 invoked by uid 500); 23 Nov 2010 20:12:38 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 30404 invoked by uid 500); 23 Nov 2010 20:12:38 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 30396 invoked by uid 99); 23 Nov 2010 20:12:38 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 23 Nov 2010 20:12:38 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.22] (HELO thor.apache.org) (140.211.11.22) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 23 Nov 2010 20:12:36 +0000 Received: from thor (localhost [127.0.0.1]) by thor.apache.org (8.13.8+Sun/8.13.8) with ESMTP id oANKCEFY011978 for ; Tue, 23 Nov 2010 20:12:14 GMT Message-ID: <18209668.270161290543134091.JavaMail.jira@thor> Date: Tue, 23 Nov 2010 15:12:14 -0500 (EST) From: "stack (JIRA)" To: issues@hbase.apache.org Subject: [jira] Updated: (HBASE-3266) Master does not seem to properly scan ZK for running RS during startup In-Reply-To: <28011155.255561290501313698.JavaMail.jira@thor> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HBASE-3266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-3266: ------------------------- Fix Version/s: 0.90.0 Bringing into 0.90.0 while we triage. > Master does not seem to properly scan ZK for running RS during startup > ---------------------------------------------------------------------- > > Key: HBASE-3266 > URL: https://issues.apache.org/jira/browse/HBASE-3266 > Project: HBase > Issue Type: Bug > Components: master > Affects Versions: 0.90.0 > Reporter: Todd Lipcon > Priority: Critical > Fix For: 0.90.0 > > > I was in the situation described by HBASE-3265, where I had a number of RS waiting on ROOT, but the master hadn't seen any RS checkins, so was waiting on checkins. To get past this, I restarted one of the region servers. The restarted server checked in, and the master began its startup. > At this point the master started scanning /hbase/.logs for things to split. It correctly identified that the RS on haus01 was running (this is the one I restarted): > 2010-11-23 00:21:25,595 INFO org.apache.hadoop.hbase.master.MasterFileSystem: Log folder hdfs://haus01.sf.cloudera.com:11020/hbase-normal/.logs/haus01.sf.cloudera.com,60020,1290500443143 belongs to an existing region server > but then incorrectly decided that the RS on haus02 was down: > 2010-11-23 00:21:25,595 INFO org.apache.hadoop.hbase.master.MasterFileSystem: Log folder hdfs://haus01.sf.cloudera.com:11020/hbase-normal/.logs/haus02.sf.cloudera.com,60020,1290498411450 doesn't belong to a known region server, splitting > However ZK shows that this RS is up: > [zk: haus01.sf.cloudera.com:2222(CONNECTED) 3] ls /hbase/rs > [haus04.sf.cloudera.com,60020,1290498411533, haus05.sf.cloudera.com,60020,1290498411520, haus03.sf.cloudera.com,60020,1290498411518, haus01.sf.cloudera.com,60020,1290500443143, haus02.sf.cloudera.com,60020,1290498411450] > splitLogsAfterStartup seems to check ServerManager.onlineServers, which best I can tell is derived from heartbeats and not from ZK (sorry if I got some of this wrong, still new to this new codebase) > Of course, the master went into an infinite splitting loop at this point since haus02 is up and renewing its DFS lease on its logs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.