Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id BD6FC42D7 for ; Mon, 23 May 2011 16:26:59 +0000 (UTC) Received: (qmail 22751 invoked by uid 500); 23 May 2011 16:26:58 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 22716 invoked by uid 500); 23 May 2011 16:26:58 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 22708 invoked by uid 99); 23 May 2011 16:26:58 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 23 May 2011 16:26:58 +0000 X-ASF-Spam-Status: No, hits=-0.1 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [212.227.126.204] (HELO mxintern.schlund.de) (212.227.126.204) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 23 May 2011 16:26:51 +0000 Received: from [10.2.3.44] (helo=exnlb02.webde.local) by mxintern.schlund.de with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (envelope-from ) id 1QOXxd-0000pD-Ln for user@hbase.apache.org; Mon, 23 May 2011 18:26:29 +0200 Received: from exnlb11.webde.local (172.19.74.11) by exnlb02.webde.local (10.2.3.44) with Microsoft SMTP Server (TLS) id 8.2.255.0; Mon, 23 May 2011 18:26:29 +0200 Received: from [172.28.124.198] (172.28.124.198) by smtp.extranet.1and1.com (217.72.200.71) with Microsoft SMTP Server (TLS) id 8.2.255.0; Mon, 23 May 2011 18:26:28 +0200 Message-ID: <4DDA8ADB.7060603@1and1.ro> Date: Mon, 23 May 2011 19:27:07 +0300 From: Daniel Iancu Organization: 1and1 User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.4) Gecko/20100608 Lightning/1.0b2 Thunderbird/3.1 MIME-Version: 1.0 To: "user@hbase.apache.org" Subject: live regionservers reported dead Content-Type: multipart/alternative; boundary="------------050100020409050206070805" X-Virus-Scanned: Symantec AntiVirus Scan Engine X-UI-Msg-Verification: dac063d4eeb41b27d13158e146be1eb5 --------------050100020409050206070805 Content-Type: text/plain; charset="ISO-8859-1"; format=flowed Content-Transfer-Encoding: 7bit Hello everybody I've run into this strange problem. We run a 6 RS cluster and suddenly the client application started reporting errors, region not online. In the web console all regionserver appeared up. I've run hbck and got strange results Number of Tables: 2 Number of live region servers: 6 Number of dead region servers: 12 Cluster was in inconsistent state. With hbase shell status 'detailed' I got the dead machines 12 dead servers search-hadoop-eu006.v300.gmx.net,60020,1305025929461 search-hadoop-eu002.v300.gmx.net,60020,1305019508570 search-hadoop-eu004.v300.gmx.net,60020,1305019551236 search-hadoop-eu003.v300.gmx.net,60020,1305025688666 search-hadoop-eu005.v300.gmx.net,60020,1305025841017 search-hadoop-eu006.v300.gmx.net,60020,1306156842070 search-hadoop-eu005.v300.gmx.net,60020,1305019568146 search-hadoop-eu001.v300.gmx.net,60020,1305025543786 search-hadoop-eu004.v300.gmx.net,60020,1305025761173 search-hadoop-eu002.v300.gmx.net,60020,1305025611163 search-hadoop-eu006.v300.gmx.net,60020,1305019572576 search-hadoop-eu003.v300.gmx.net,60020,1305019547053 It appears that all live regionserver are listed as dead also. I tried hbck -fix and the cluster is now in Ok state but still reports 12 machines dead as above. I've checked the logs but nothing obvious. Any idea? We use CDH3u0. Thanks Daniel --------------050100020409050206070805--