hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Marc Spaggiari <jean-m...@spaggiari.org>
Subject Re: HBase issues since upgrade from 0.92.4 to 0.94.6
Date Fri, 12 Jul 2013 11:32:28 GMT
Might want to run memtest also, just to be sure there is no memory issue.
It should not since it was working fine with 0.92.4, but costs nothing...

the last version of Java 6 is 45... Might also worst to give it a try if
you are running with 1.6.

2013/7/12 Asaf Mesika <asaf.mesika@gmail.com>

> You need to see the jvm crash in .out log file and see if maybe its the .so
> native Hadoop code that making the problem. In our case we
> Downgraded from jvm 1.6.0-37 to 33 and it solved the issue.
>
>
> On Friday, July 12, 2013, David Koch wrote:
>
> > Hello,
> >
> > NOTE: I posted the same message in the the Cloudera group.
> >
> > Since upgrading from CDH 4.0.1 (HBase 0.92.4) to 4.3.0 (HBase 0.94.6) we
> > systematically experience problems with region servers crashing silently
> > under workloads which used to pass without problems. More specifically,
> we
> > run about 30 Mapper jobs in parallel which read from HDFS and insert in
> > HBase.
> >
> > region server log
> > NOTE: no trace of crash, but server is down and shows up as such in
> > Cloudera Manager.
> >
> > 2013-07-12 10:22:12,050 WARN
> > org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: File
> >
> >
> hdfs://XXXXXXX:8020/hbase/.logs/XXXXXXX,60020,1373616547696-splitting/XXXXXXX%2C60020%2C1373616547696.1373617004286
> > might be still open, length is 0
> > 2013-07-12 10:22:12,051 INFO org.apache.hadoop.hbase.util.FSHDFSUtils:
> > Recovering file
> >
> >
> hdfs://XXXXXXX:8020/hbase/.logs/XXXXXXX,60020,1373616547696-splitting/XXXXXXX
> > t%2C60020%2C1373616547696.1373617004286
> > 2013-07-12 10:22:13,064 INFO org.apache.hadoop.hbase.util.FSHDFSUtils:
> > Finished lease recover attempt for
> >
> >
> hdfs://XXXXXXX:8020/hbase/.logs/XXXXXXX,60020,1373616547696-splitting/XXXXXXX%2C60020%2C1373616547696.1373617004286
> > 2013-07-12 10:22:14,819 INFO org.apache.hadoop.io.compress.CodecPool: Got
> > brand-new compressor [.deflate]
> > 2013-07-12 10:22:14,824 INFO org.apache.hadoop.io.compress.CodecPool: Got
> > brand-new compressor [.deflate]
> > ...
> > 2013-07-12 10:22:14,850 INFO org.apache.hadoop.io.compress.CodecPool: Got
> > brand-new compressor [.deflate]
> > 2013-07-12 10:22:15,530 INFO org.apache.hadoop.io.compress.CodecPool: Got
> > brand-new compressor [.deflate]
> > < -- last log entry, region server is down here -- >
> >
> >
> > datanode log, same machine
> >
> > 2013-07-12 10:22:04,811 ERROR
> > org.apache.hadoop.hdfs.server.datanode.DataNode:
> XXXXXXX:50010:DataXceiver
> > error processing WRITE_BLOCK operation  src: /YYY.YY.YYY.YY:36024 dest:
> > /XXX.XX.XXX.XX:50010
> > java.io.IOException: Premature EOF from inputStream
> > at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:194)
> > at
> >
> >
> org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213)
> > at
> >
> >
> org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134)
> > at
> >
> >
> org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109)
> > at
> >
> >
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:414)
> > at
> >
> >
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:635)
> > at
> >
> >
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:564)
> > at
> >
> >
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:103)
> > at
> >
> >
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:67)
> > at
> >
> >
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221)
> > at java.lang.Thread.run(Thread.java:724)
> > < -- many repetitions of this -- >
> >
> > What could have caused this difference in stability?
> >
> > We did not change any configuration settings with respect to the previous
> > CDH 4.0.1 setup. In particular, we left ulimit and
> > dfs.datanode.max.xcievers at 32k. If need be, I can provide more complete
> > log/configuration information.
> >
> > Thank you,
> >
> > /David
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message