Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id BDBA1C4CC for ; Thu, 8 Jan 2015 16:12:48 +0000 (UTC) Received: (qmail 24858 invoked by uid 500); 8 Jan 2015 16:12:47 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 24788 invoked by uid 500); 8 Jan 2015 16:12:47 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 24765 invoked by uid 99); 8 Jan 2015 16:12:43 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 08 Jan 2015 16:12:43 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW X-Spam-Check-By: apache.org Received-SPF: error (athena.apache.org: local policy) Received: from [209.85.220.173] (HELO mail-vc0-f173.google.com) (209.85.220.173) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 08 Jan 2015 16:12:39 +0000 Received: by mail-vc0-f173.google.com with SMTP id kv19so1399701vcb.4 for ; Thu, 08 Jan 2015 08:11:13 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:content-type; bh=AHl1OfrWxJSjZljRiAgsPrB9HcEBW2UMdAPZD3RBXbY=; b=Aa9eS+faMwdM8gEd6qZgDZ+URoo/qjQbpSw6awneReYFaU/nGi1WtL4xwMedjvRxR3 Rb1OfoJscoSjsHYe2KQgsfKsdUamOenWpXscYEVM+xvcHTKzGVBnnYd6iSN2PJBlBvpm +y2B3uI4UzccGS2Lf5nFz9oQCKz+FKfAxGItBjjd0vnf+Wm/p00DctO/oqBJloyCukeF 38fFu8C0qDHcC0ipkxUB3iuOk1+dtCu849p6r66/C0SzMbAUbaMx6FAYji1jQtCZwWns zn6KLJYmx3e2CFDljUBFlu2CYUX/Qlue+tLRUyJvrOL528Mh71rwMtasOPmS9d080Fli 3l5g== X-Gm-Message-State: ALoCoQnMjbCPyXbAzSBP9mNZpy7r/5cR2eBfmSFoJgNVtLQYhkI3ufMwrJSQcAXO2t7eRq+arCIK X-Received: by 10.52.244.101 with SMTP id xf5mr5838249vdc.6.1420733473468; Thu, 08 Jan 2015 08:11:13 -0800 (PST) MIME-Version: 1.0 Received: by 10.52.109.230 with HTTP; Thu, 8 Jan 2015 08:10:53 -0800 (PST) In-Reply-To: <000001d02b1b$03cca900$0b65fb00$@sina.com> References: <000001d02b1b$03cca900$0b65fb00$@sina.com> From: Jean-Marc Spaggiari Date: Thu, 8 Jan 2015 11:10:53 -0500 Message-ID: Subject: Re: HBase Regionserver very easy to die To: user Content-Type: multipart/alternative; boundary=001a11c2c79001c711050c264a4e X-Virus-Checked: Checked by ClamAV on apache.org --001a11c2c79001c711050c264a4e Content-Type: text/plain; charset=UTF-8 Hi, How is your HDFS doing? Have you looked at FSCK, Namenode interface,etc.? Sound like HBase is not able to write to it... JM 2015-01-08 3:13 GMT-05:00 gao <895407214gao@sina.com>: > Hi: > > I am getting constant stability problems with the HBase Regionserver, it > dies randomly everyday or every other day. It normally dies shortly after > printing the following: > > 2014-12-30 23:06:17,091 ERROR [regionserver60020.logRoller] > wal.ProtobufLogWriter: Got IOException while writing trailer > > org.apache.hadoop.ipc.RemoteException(java.io.IOException): File > > /hbase/WALs/zjdx107,60020,1418269148759/zjdx107%2C60020%2C1418269148759.1419 > 977176935 could only be replicated to 0 nodes instead of minReplication > (=1). There are 12 datanode(s) running and no node(s) are excluded in this > operation. > > at > > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(Bloc > kManager.java:1430) > > at > > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNam > esystem.java:2659) > > at > > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRp > cServer.java:569) > > at > > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslator > PB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440) > > at > > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNam > enodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > > at > > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(Proto > bufRpcEngine.java:585) > > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026) > > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1986) > > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1982) > > at java.security.AccessController.doPrivileged(Native Method) > > at javax.security.auth.Subject.doAs(Subject.java:415) > > at > > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.ja > va:1548) > > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1980) > > > > at org.apache.hadoop.ipc.Client.call(Client.java:1409) > > at org.apache.hadoop.ipc.Client.call(Client.java:1362) > > at > > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.jav > a:206) > > at com.sun.proxy.$Proxy13.addBlock(Unknown Source) > > at sun.reflect.GeneratedMethodAccessor31.invoke(Unknown Source) > > at > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl > .java:43) > > at java.lang.reflect.Method.invoke(Method.java:606) > > at > > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocati > onHandler.java:186) > > at > > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHand > ler.java:102) > > at com.sun.proxy.$Proxy13.addBlock(Unknown Source) > > at > > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBloc > k(ClientNamenodeProtocolTranslatorPB.java:361) > > at sun.reflect.GeneratedMethodAccessor30.invoke(Unknown Source) > > at > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl > .java:43) > > at java.lang.reflect.Method.invoke(Method.java:606) > > at > org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:266) > > at com.sun.proxy.$Proxy14.addBlock(Unknown Source) > > at > > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFS > OutputStream.java:1437) > > at > > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DF > SOutputStream.java:1260) > > at > > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java > :525) > > 2014-12-30 23:06:17,092 ERROR [regionserver60020.logRoller] wal.FSHLog: > Failed close of HLog writer > > org.apache.hadoop.ipc.RemoteException(java.io.IOException): File > > /hbase/WALs/zjdx107,60020,1418269148759/zjdx107%2C60020%2C1418269148759.1419 > 977176935 could only be replicated to 0 nodes instead of minReplication > (=1). There are 12 datanode(s) running and no node(s) are excluded in this > operation. > > at > > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(Bloc > kManager.java:1430) > > at > > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNam > esystem.java:2659) > > at > > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRp > cServer.java:569) > > at > > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslator > PB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440) > > at > > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNam > enodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > > at > > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(Proto > bufRpcEngine.java:585) > > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026) > > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1986) > > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1982) > > at java.security.AccessController.doPrivileged(Native Method) > > at javax.security.auth.Subject.doAs(Subject.java:415) > > at > > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.ja > va:1548) > > > > --001a11c2c79001c711050c264a4e--