Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id CFB1810C03 for ; Fri, 12 Jul 2013 11:42:24 +0000 (UTC) Received: (qmail 52126 invoked by uid 500); 12 Jul 2013 11:42:22 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 51857 invoked by uid 500); 12 Jul 2013 11:42:22 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 51848 invoked by uid 99); 12 Jul 2013 11:42:22 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 12 Jul 2013 11:42:22 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of azuryyyu@gmail.com designates 209.85.212.44 as permitted sender) Received: from [209.85.212.44] (HELO mail-vb0-f44.google.com) (209.85.212.44) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 12 Jul 2013 11:42:15 +0000 Received: by mail-vb0-f44.google.com with SMTP id e15so1444265vbg.31 for ; Fri, 12 Jul 2013 04:41:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=e6gGvcDIkaZQ+Qo/j538dAVYknPduS0pFDhXcuTpuDc=; b=BRFQoFQm2saDaEveP1hYBjEmwLWYSRJw3bntAm2fWp4oAwfh3NJaTn3JXHGNt3BNLT YGxsw8AMI/mQbeNsLzq2V/wv9lqmPPosxzF9WC0ItKBKBvtMtlBNrM+AYc/FT9Eyc5Vi zyLF4N7t5O50308rtX1uV9m6ajYPMbBthFa6gZGE8RYHGtOg8fWIxtr08vSqrqlDPJVo df/DYxFo7ImXYV5uhXwxjlN1MuR7W6ko66zyWicRZ3RCUatS+yArtKCnt4HfXVWlJqnm B1KtaZqieldAAmcfDDe019e5+vFwcfvYW+L+Z+mR8kiysEeuU0abcDwMhi1F5gQiug5H nJfA== MIME-Version: 1.0 X-Received: by 10.221.49.134 with SMTP id va6mr24283359vcb.14.1373629315154; Fri, 12 Jul 2013 04:41:55 -0700 (PDT) Received: by 10.220.15.6 with HTTP; Fri, 12 Jul 2013 04:41:55 -0700 (PDT) In-Reply-To: References: Date: Fri, 12 Jul 2013 19:41:55 +0800 Message-ID: Subject: Re: HBase issues since upgrade from 0.92.4 to 0.94.6 From: Azuryy Yu To: user@hbase.apache.org Content-Type: multipart/alternative; boundary=001a11339eda62067f04e14eff34 X-Virus-Checked: Checked by ClamAV on apache.org --001a11339eda62067f04e14eff34 Content-Type: text/plain; charset=ISO-8859-1 David, java.io.IOException: Premature EOF from inputStream at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:194) for this error, generally client always ask for bytes from the stream, but sever has been shut down, so there maybe network issue or JVM crashed or some others. I don't think this is releate to the HBase upgrade. On Fri, Jul 12, 2013 at 7:32 PM, Jean-Marc Spaggiari < jean-marc@spaggiari.org> wrote: > Might want to run memtest also, just to be sure there is no memory issue. > It should not since it was working fine with 0.92.4, but costs nothing... > > the last version of Java 6 is 45... Might also worst to give it a try if > you are running with 1.6. > > 2013/7/12 Asaf Mesika > > > You need to see the jvm crash in .out log file and see if maybe its the > .so > > native Hadoop code that making the problem. In our case we > > Downgraded from jvm 1.6.0-37 to 33 and it solved the issue. > > > > > > On Friday, July 12, 2013, David Koch wrote: > > > > > Hello, > > > > > > NOTE: I posted the same message in the the Cloudera group. > > > > > > Since upgrading from CDH 4.0.1 (HBase 0.92.4) to 4.3.0 (HBase 0.94.6) > we > > > systematically experience problems with region servers crashing > silently > > > under workloads which used to pass without problems. More specifically, > > we > > > run about 30 Mapper jobs in parallel which read from HDFS and insert in > > > HBase. > > > > > > region server log > > > NOTE: no trace of crash, but server is down and shows up as such in > > > Cloudera Manager. > > > > > > 2013-07-12 10:22:12,050 WARN > > > org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: File > > > > > > > > > hdfs://XXXXXXX:8020/hbase/.logs/XXXXXXX,60020,1373616547696-splitting/XXXXXXX%2C60020%2C1373616547696.1373617004286 > > > might be still open, length is 0 > > > 2013-07-12 10:22:12,051 INFO org.apache.hadoop.hbase.util.FSHDFSUtils: > > > Recovering file > > > > > > > > > hdfs://XXXXXXX:8020/hbase/.logs/XXXXXXX,60020,1373616547696-splitting/XXXXXXX > > > t%2C60020%2C1373616547696.1373617004286 > > > 2013-07-12 10:22:13,064 INFO org.apache.hadoop.hbase.util.FSHDFSUtils: > > > Finished lease recover attempt for > > > > > > > > > hdfs://XXXXXXX:8020/hbase/.logs/XXXXXXX,60020,1373616547696-splitting/XXXXXXX%2C60020%2C1373616547696.1373617004286 > > > 2013-07-12 10:22:14,819 INFO org.apache.hadoop.io.compress.CodecPool: > Got > > > brand-new compressor [.deflate] > > > 2013-07-12 10:22:14,824 INFO org.apache.hadoop.io.compress.CodecPool: > Got > > > brand-new compressor [.deflate] > > > ... > > > 2013-07-12 10:22:14,850 INFO org.apache.hadoop.io.compress.CodecPool: > Got > > > brand-new compressor [.deflate] > > > 2013-07-12 10:22:15,530 INFO org.apache.hadoop.io.compress.CodecPool: > Got > > > brand-new compressor [.deflate] > > > < -- last log entry, region server is down here -- > > > > > > > > > > datanode log, same machine > > > > > > 2013-07-12 10:22:04,811 ERROR > > > org.apache.hadoop.hdfs.server.datanode.DataNode: > > XXXXXXX:50010:DataXceiver > > > error processing WRITE_BLOCK operation src: /YYY.YY.YYY.YY:36024 dest: > > > /XXX.XX.XXX.XX:50010 > > > java.io.IOException: Premature EOF from inputStream > > > at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:194) > > > at > > > > > > > > > org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213) > > > at > > > > > > > > > org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134) > > > at > > > > > > > > > org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109) > > > at > > > > > > > > > org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:414) > > > at > > > > > > > > > org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:635) > > > at > > > > > > > > > org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:564) > > > at > > > > > > > > > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:103) > > > at > > > > > > > > > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:67) > > > at > > > > > > > > > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221) > > > at java.lang.Thread.run(Thread.java:724) > > > < -- many repetitions of this -- > > > > > > > What could have caused this difference in stability? > > > > > > We did not change any configuration settings with respect to the > previous > > > CDH 4.0.1 setup. In particular, we left ulimit and > > > dfs.datanode.max.xcievers at 32k. If need be, I can provide more > complete > > > log/configuration information. > > > > > > Thank you, > > > > > > /David > > > > > > --001a11339eda62067f04e14eff34--