hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Samir Ahmic <ahmic.sa...@gmail.com>
Subject Re: Regionserver crash
Date Tue, 17 Jun 2014 11:43:45 GMT
Hi Ian,

What hadoop fsck / says ? Maybe you have some corrupted data on your
cluster. Also try using hbase hbck do investigate issue. If you have disk
space issues try adding more data nodes to your cluster. Regarding errors
 you have send they are thrown because ProtobufLogWriter is unable to write
WALTrailer.
Also try to check DataNode logs on servers where regions are hosted. What
versions of hadoop/hbase you are using ?

Good luck
Samir


On Tue, Jun 17, 2014 at 11:07 AM, Ian Brooks <i.brooks@sensewhere.com>
wrote:

> Hi,
>
> I had an issue last night where one of the regionservers crashed due to
> running out of memory while doing a count on a table with 8M rows, This
> left the system in a state where is though the regions on that server were
> still online but couldn't be accessed. trying to either move or offline the
> regions resulted in a timeout failure.
>
> I ended up taking the whole hbase cluster down, but when I tried to bring
> it back up again, it failed to bring those regions back online, mainly
> spewing out vast amounts of logs relating to IO errors on the WAL splitting
> files and .tmp files.
>
> Unfortunately I don't have most of the the logs anymore as the errors
> resulting in the nodes running out of diskspace on the nodes from the
> failure errors but most of the errors were like the one below  - there were
> no errors in the hdfs logs for this time.
>
> 2014-06-16 17:47:02,902 ERROR [regionserver16020.logRoller]
> wal.ProtobufLogWriter: Got IOException while writing trailer
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
> /user/hbase/WALs/sw-hadoop-008,16020,1402933620893/sw-hadoop-008%2C16020%2C1402933620893.1402933622645
> could only be replicated to 0 nodes instead of minReplication (=1).  There
> are 8 datanode(s) running and no node(s) are excluded in this operation.
>         at
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1430)
>         at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2684)
>         at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584)
>         at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440)
>         at
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>         at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:415)
>         at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
>
>         at org.apache.hadoop.ipc.Client.call(Client.java:1347)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1300)
>         at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>         at com.sun.proxy.$Proxy13.addBlock(Unknown Source)
>         at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:330)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:606)
>         at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
>         at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>         at com.sun.proxy.$Proxy14.addBlock(Unknown Source)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:606)
>         at
> org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:294)
>         at com.sun.proxy.$Proxy15.addBlock(Unknown Source)
>         at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1226)
>         at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1078)
>         at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:514)
> 2014-06-16 17:47:02,902 ERROR [regionserver16020.logRoller] wal.FSHLog:
> Failed close of HLog writer
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
> /user/hbase/WALs/sw-hadoop-008,16020,1402933620893/sw-hadoop-008%2C16020%2C1402933620893.1402933622645
> could only be replicated to 0 nodes instead of minReplication (=1).  There
> are 8 datanode(s) running and no node(s) are excluded in this operation.
>         at
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1430)
>         at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2684)
>         at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584)
>         at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440)
>         at
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>         at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:415)
>         at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
>
>         at org.apache.hadoop.ipc.Client.call(Client.java:1347)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1300)
>         at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>         at com.sun.proxy.$Proxy13.addBlock(Unknown Source)
>         at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:330)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:606)
>         at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
>         at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>         at com.sun.proxy.$Proxy14.addBlock(Unknown Source)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:606)
>         at
> org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:294)
>         at com.sun.proxy.$Proxy15.addBlock(Unknown Source)
>         at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1226)
>         at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1078)
>         at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:514)
> 2014-06-16 17:47:02,909 WARN  [regionserver16020.logRoller] wal.FSHLog:
> Riding over HLog close failure! error count=1
>
>
> If the regions are marked as online but the shell won't let you do
> anything, what is the best/correct way to get them back online again?
>
> -Ian Brooks
>
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message