Return-Path: Delivered-To: apmail-hbase-user-archive@www.apache.org Received: (qmail 99847 invoked from network); 21 Dec 2010 03:08:39 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 21 Dec 2010 03:08:39 -0000 Received: (qmail 35050 invoked by uid 500); 21 Dec 2010 03:08:38 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 34969 invoked by uid 500); 21 Dec 2010 03:08:38 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 34956 invoked by uid 99); 21 Dec 2010 03:08:37 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 21 Dec 2010 03:08:37 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of zhoushuaifeng@huawei.com designates 119.145.14.64 as permitted sender) Received: from [119.145.14.64] (HELO szxga01-in.huawei.com) (119.145.14.64) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 21 Dec 2010 03:08:32 +0000 Received: from huawei.com (szxga05-in [172.24.2.49]) by szxga05-in.huawei.com (iPlanet Messaging Server 5.2 HotFix 2.14 (built Aug 8 2006)) with ESMTP id <0LDR00GCKCPME8@szxga05-in.huawei.com> for user@hbase.apache.org; Tue, 21 Dec 2010 11:08:10 +0800 (CST) Received: from huawei.com ([172.24.2.119]) by szxga05-in.huawei.com (iPlanet Messaging Server 5.2 HotFix 2.14 (built Aug 8 2006)) with ESMTP id <0LDR002V7CPLLG@szxga05-in.huawei.com> for user@hbase.apache.org; Tue, 21 Dec 2010 11:08:09 +0800 (CST) Received: from z00100568 ([10.144.112.54]) by szxml04-in.huawei.com (iPlanet Messaging Server 5.2 HotFix 2.14 (built Aug 8 2006)) with ESMTPA id <0LDR005PYCPKSL@szxml04-in.huawei.com> for user@hbase.apache.org; Tue, 21 Dec 2010 11:08:09 +0800 (CST) Date: Tue, 21 Dec 2010 11:08:05 +0800 From: Zhou Shuaifeng Subject: Re: all regionserver shutdown after close hdfs datanode In-reply-to: <4D0F7A43.9070007@1and1.ro> To: user@hbase.apache.org Cc: yanlijun@huawei.com, syang@huawei.com Message-id: <010001cba0bc$4b8256c0$e2870440$@com> MIME-version: 1.0 X-Mailer: Microsoft Office Outlook 12.0 Content-type: text/plain; charset=gb2312 Content-language: zh-cn Content-transfer-encoding: quoted-printable Thread-index: AcugXSObUULdRyceQs2viTJFf/yhqQAW/f4Q References: <00be01cb9ffc$8e36eb90$aaa4c2b0$@com> <4D0F7A43.9070007@1and1.ro> Hi, I checked the log, It's not the master caused the regionserver shutdown, = but the regionserver log rolling failed caused regionserver shutdown. According the log, error occurred in the pipeline, but why hdfs are not = able to select another good data node when one datanode in the pipeline is = not available? The log: 2010-12-20 09:15:41,769 FATAL org.apache.hadoop.hbase.regionserver.LogRoller: Log rolling failed with = ioe: java.io.IOException: Error Recovery for block blk_1292656843439_2494096 failed because recovery from primary datanode 167.6.5.17:50010 failed 6 times. Pipeline was 167.6.5.17:50010. Aborting... at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFS= Cli ent.java:3249) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.ja= va: 2654) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClie= nt. java:2837) the corresponding code in regionserver: LOG.fatal("Log rolling failed with ioe: ", RemoteExceptionHandler.checkIOException(ex)); server.checkFileSystem(); // Abort if we get here. We probably won't recover an IOE. HBASE-1132 server.abort(); the abort() code: public void abort() { this.abortRequested =3D true; this.reservedSpace.clear(); LOG.info("Dump of metrics: " + this.metrics.toString()); stop(); } The corresponding log: 2010-12-20 09:15:41,777 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Dump of metrics: request=3D9.666667, regions=3D1512, stores=3D1512, storefiles=3D5833, storefileIndexSize=3D1833, memstoreSize=3D2941, = compactionQueueSize=3D1228, usedHeap=3D6849, maxHeap=3D8165, blockCacheSize=3D14047672, blockCacheFree=3D1698276936, blockCacheCount=3D0, = blockCacheHitRatio=3D0, fsReadLatency=3D0, fsWriteLatency=3D59, fsSyncLatency=3D0 Zhou Shuaifeng(Frank) HUAWEI TECHNOLOGIES CO.,LTD. huawei_logo -----=D3=CA=BC=FE=D4=AD=BC=FE----- =B7=A2=BC=FE=C8=CB: Daniel Iancu [mailto:daniel.iancu@1and1.ro]=20 =B7=A2=CB=CD=CA=B1=BC=E4: 2010=C4=EA12=D4=C220=C8=D5 23:46 =CA=D5=BC=FE=C8=CB: user@hbase.apache.org =D6=F7=CC=E2: Re: all regionserver shutdown after close hdfs datanode Hi Zhou You should check if the HMaster is still up. If not, check its logs, if=20 for some reason HMaster thinks HDFS is not available it will shutdown the HBase cluster. Regards Daniel On 12/20/2010 06:15 AM, Zhou Shuaifeng wrote: > Hi, > > > > I have a cluster of 8 hdfs datanodes and 8 hbase regionservers. When = I > shutdown one node(a pc with one datanode and one regionserver = running), all > hbase regionservers shutdown after a while. > > Other 7 hdfs datanodes is OK. > > > > I think it's not reasionable. Hbase is a distribute system that should > tolerance some nodes abnormal. So, what's the matter? Is there any configure > that can solve this problem or is a bug? > > > > Thanks and best Regards. > > > > Zhou > > -------------------------------------------------------------------------= --- > --------------------------------------------------------- > This e-mail and its attachments contain confidential information from > HUAWEI, which > is intended only for the person or entity whose address is listed = above. Any > use of the > information contained herein in any way (including, but not limited = to, > total or partial > disclosure, reproduction, or dissemination) by persons other than the > intended > recipient(s) is prohibited. If you receive this e-mail in error, = please > notify the sender by > phone or email immediately and delete it! > --=20 Daniel Iancu Java Developer,Web Components Romania 1&1 Internet Development srl. 18 Mircea Eliade St Sect 1, Bucharest RO Bucharest, 012015 www.1and1.ro Phone:+40-031-223-9081 Email:daniel.iancu@1and1.ro IM:diancu@united.domain