Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 530171782A for ; Fri, 7 Nov 2014 12:59:49 +0000 (UTC) Received: (qmail 95022 invoked by uid 500); 7 Nov 2014 12:59:46 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 94949 invoked by uid 500); 7 Nov 2014 12:59:46 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 94938 invoked by uid 99); 7 Nov 2014 12:59:45 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 07 Nov 2014 12:59:45 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_NEUTRAL,WEIRD_PORT X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [209.85.220.41] (HELO mail-pa0-f41.google.com) (209.85.220.41) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 07 Nov 2014 12:59:41 +0000 Received: by mail-pa0-f41.google.com with SMTP id rd3so3486149pab.28 for ; Fri, 07 Nov 2014 04:57:51 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:content-type; bh=UjxilSr2F7RG8aue7/8rz6TW7j1w+RmClQC9U+0vTHc=; b=Np0GFPGB1p5OQ5ieo9E+uH4IB+AYDd6OsTYxyv+01VKeojaIM4mf9hK2Mv98PccQFF 3zR4wJs+X0EjHaUBBSuXxlKhTblJM/7dSwFl+VMxw5wOrt68LAp1YwA+ZIxbCO5XhXTP g7biW9+dmsjkiq4+iGFuvHfx64jdoDRd8vXe6OvDf41NlFKDQZaYU63OLNgBTKSrjcml 8ziD2zpSLfiwF/Qyqr6Yqs+xI48t+nfYnSgcZi/+8RKsyH4hLSwBrx1LAjUZ8nNEoy0x 7pz1DT4od4g5P4Q2TH7JJfBG3F6VeeoUdxnsVHtK/e2706lW7/5cP6VUddUjLWCXzHuz lmOg== X-Gm-Message-State: ALoCoQnwud6tTihzm6nNd/A5AfdOu5zlMnBnM6/5m/F7pa/4wUD38AWa46MB9G4l3/kZtNbUXgW+ X-Received: by 10.68.137.101 with SMTP id qh5mr11943164pbb.13.1415365067162; Fri, 07 Nov 2014 04:57:47 -0800 (PST) MIME-Version: 1.0 Received: by 10.70.2.132 with HTTP; Fri, 7 Nov 2014 04:57:26 -0800 (PST) In-Reply-To: <2014110712005908077159@sina.cn> References: <2014110712005908077159@sina.cn> From: Jean-Marc Spaggiari Date: Fri, 7 Nov 2014 07:57:26 -0500 Message-ID: Subject: Re: hbase cannot normally start regionserver in the environment of big data. To: user Content-Type: multipart/alternative; boundary=047d7b2e43f40e62fe0507445cd4 X-Virus-Checked: Checked by ClamAV on apache.org --047d7b2e43f40e62fe0507445cd4 Content-Type: text/plain; charset=UTF-8 Hi, Have you checked that your Hadoop is running fine? Have you checked that network between your servers is fine to? JM 2014-11-07 5:22 GMT-05:00 hankedang@sina.cn : > I've deploied a "2+4" cluster which has been normally running for a > long time. > The cluster has got more than 40T data.When I initiatively shut the hbase > service > and try to restart it,the regionserver will be dead. > > The log of regionserver shows that all the regions are opened. But in > the logs of the datanode can see WARN and ERROR logs. > Bellow is the log for details: > > 2014-11-07 14:47:21,584 INFO > org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: / > 10.230.63.12:50010, dest: /10.230.63.9:39405, bytes: 4696, op: HDFS_READ, > cliID: DFSClient_hb_rs_salve1,60020,1415342303886_- > 2037622978_29, offset: 31996928, srvID: > bb0032a3-1170-4a34-b85b-e2cfa0d56cb2, blockid: BP-1731746090-10.230.63.3- > 1406195669990:blk_1078709392_4968828, duration: 7978822 > 2014-11-07 14:47:21,596 INFO > org.apache.hadoop.hdfs.server.datanode.DataNode: exception: > java.net.SocketTimeoutException: 480000 millis timeout while waiting > for channel to be ready for write. ch : > java.nio.channels.SocketChannel[connected local=/10.230.63.12:50010 > remote=/10.230.63.11:41511] > at > org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246) > at > org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:172) > at > org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:220) > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:547) > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:712) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:479) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:110) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:68) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:229) > at java.lang.Thread.run(Thread.java:744) > 2014-11-07 14:47:21,599 INFO > org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: / > 10.230.63.12:50010, dest: /10.230.63.11:41511, bytes: 726528, op: > HDFS_READ, cliID: DFSClient_hb_rs_salve3,60020,1415342303807_1094119849_29, > offset: 0, srvID: bb0032a3-1170-4a34-b85b-e2cfa0d56cb2, blockid: > BP-1731746090-10.230.63.3-1406195669990:blk_1078034913_4294168, duration: > 480190668115 > 2014-11-07 14:47:21,599 WARN > org.apache.hadoop.hdfs.server.datanode.DataNode: > DatanodeRegistration(10.230.63.12, > datanodeUuid=bb0032a3-1170-4a34-b85b-e2cfa0d56cb2, infoPort=50075, > ipcPort=50020, storageInfo=lv=-55;cid=cluster12;nsid=395652542;c=0):Got > exception while serving > BP-1731746090-10.230.63.3-1406195669990:blk_1078034913_4294168 to / > 10.230.63.11:41511 > java.net.SocketTimeoutException: 480000 millis timeout while waiting for > channel to be ready for write. ch : > java.nio.channels.SocketChannel[connected local=/10.230.63.12:50010 > remote=/10.230.63.11:41511] > at > org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246) > at > org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:172) > at > org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:220) > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:547) > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:712) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:479) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:110) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:68) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:229) > at java.lang.Thread.run(Thread.java:744) > 2014-11-07 14:47:21,600 ERROR > org.apache.hadoop.hdfs.server.datanode.DataNode: salve4:50010:DataXceiver > error processing READ_BLOCK operation src: /10.230.63.11:41511 dest: / > 10.230.63.12:50010 > > > I personally think it was caused on the load on open stage,where the > disk IO of the cluster can > be very high and the pressure can be huge. > > I wonder what results in reading error while reading hfile,and what > leads to timeout. > Are there any solutions that can control the speed of loading on open and > reduce > pressure of the cluster? > > I need help ! > > Thanks! > > > > > hankedang@sina.cn > --047d7b2e43f40e62fe0507445cd4--