hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Álvaro Recuero <algar...@gmail.com>
Subject Re: Error in RS with 0.94.8
Date Fri, 25 Apr 2014 08:32:36 GMT
Data nodes are fine. Actually the Region server on that serverxxxxx is the
solely one dead afterwards. Datanode is up, and HDFS reporting healthy
status. Interesting that is possible.

I have steadily come across the problem again testing a new HBase cluster,
so yes, I would bet the problem is in HDFS somehow. Probably something is
missing yes.

2014-04-24 17:59:30,003 WARN org.apache.hadoop.hdfs.DFSClient: Error
Recovery for block null bad datanode[0] nodes == null
2014-04-24 17:59:30,003 WARN org.apache.hadoop.hdfs.DFSClient: Could not
get block locations. Source file
"/hbase/.logs/serverxxxxx,1398350408274/serverxxxxx%2C60020%2C1398350408274.1398350409004"
- Aborting...
2014-04-24 17:59:30,003 ERROR
org.apache.hadoop.hbase.regionserver.wal.HLog: syncer encountered error,
will retry. txid=1
org.apache.hadoop.ipc.RemoteException: java.io.IOException: File
/hbase/.logs/serverxxxxx,60020,1398350408274/serverxxxxx%2C60020%2C1398350408274.1398350409004
could only be replicated to 0 nodes, instead of 1
        at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1558)
        at
org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:696)
        at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:616)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1384)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.securitWrite failed: Broken pipect.java:416)


On 5 April 2014 21:58, Álvaro Recuero <algarecu@gmail.com> wrote:

> Yes Esteban  I have checked the health of the datanodes from the master
> in the hadoop console. Nothing seems really wrong to cause this, even
> though one data-node is apparently lost along with the RS in the process of
> inserting 50 Million updates... the other 11 are there, up and running so
> it should pick-up next and that is it (as long as it is replicating as it
> should through the HDFS pipelining process). I thought of HBase
> writes-key-hotspotting or some problem in the Hadoop namenode, so checking
> this out now...
>
> I will keep investigating and let you know, in fact my first thought was
> same as yours too but ./hadoop fsck / is showing all "active" nodes are
> healthy nodes, and no file-system level inconsistencies are detected (first
> thing I checked before sending the post). Of course running the HBase hbck
> consistency check from the command line behaves differently, missing the
> mentioned RS in place and throws corresponding exception log.... that is a
> weird one then... I might check the name node before I get back to you on
> this. I can't think of anything else as of now. Space is not unlimited, yet
> sufficient in each of the data-nodes (12) but getting close to its limit in
> the mentioned dead RS so yes writes are yet not very balanced but
> definitely not the issue as I understand.
>
>
> On 5 April 2014 19:16, Esteban Gutierrez <esteban@cloudera.com> wrote:
>
>> Álvaro,
>>
>> Have you checked for the health of HDFS? Maybe your cluster ran out of
>> space or you don't have data nodes running.
>>
>> Esteban
>>
>> > On Apr 5, 2014, at 10:11, haosdent <haosdent@gmail.com> wrote:
>> >
>> > From the log informations, it seems you lost blocks.
>> > 2014-4-6 上午12:38于 "Álvaro Recuero" <algarecu@gmail.com>写道:
>> >
>> >> has anyone come across this before? there is still space in the RS and
>> this
>> >> is not a problem of datanodes availability as I can confirm. cheers
>> >>
>> >> 2014-04-05 09:55:19,210 DEBUG
>> >> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter: using
>> new
>> >> createWriter -- HADOOP-6840
>> >> 2014-04-05 09:55:19,211 DEBUG
>> >> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter:
>> >> Path=hdfs://
>> >> taurus-5.lyon.grid5000.fr:
>> >>
>> >>
>> 9000/hbase/usertable/fc55e2d2d4bcec49d6fedf5a469353b9/recovered.edits/0000000000002550928.temp,
>> >> syncFs=true, hflush=false, compressi
>> >> on=false
>> >> 2014-04-05 09:55:19,211 DEBUG
>> >> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Creating writer
>> >> path=hdfs://taurus-5.lyon.grid5
>> >>
>> >>
>> 000.fr:9000/hbase/usertable/fc55e2d2d4bcec49d6fedf5a469353b9/recovered.edits/0000000000002550928.tempregion=fc55e2d2d4bcec49d6fedf5
>> >> a469353b9
>> >> 2014-04-05 09:55:19,233 DEBUG
>> >> org.apache.hadoop.hbase.regionserver.SplitLogWorker: tasks arrived or
>> >> departed
>> >> 2014-04-05 09:55:19,233 WARN org.apache.hadoop.hdfs.DFSClient:
>> DataStreamer
>> >> Exception: org.apache.hadoop.ipc.RemoteException: java.i
>> >> o.IOException: File
>> >>
>> >>
>> /hbase/usertable/237859a0b1e47c86c25a6123506ccb2a/recovered.edits/0000000000002550921.temp
>> >> could only be replica
>> >> ted to 0 nodes, instead of 1
>> >>        at
>> >>
>> >>
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1558)
>> >>        at
>> >>
>> org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:696)
>> >>        at sun.reflect.GeneratedMethodAccessor12.invoke(Unknown Source)
>> >>        at
>> >>
>> >>
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> >>        at java.lang.reflect.Method.invoke(Method.java:616)
>> >>        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563)
>> >>        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388)
>> >>        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1384)
>> >>        at java.security.AccessController.doPrivileged(Native Method)
>> >>        at javax.security.auth.Subject.doAs(Subject.java:416)
>> >>        at
>> >>
>> >>
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
>> >>        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1382)
>> >>
>> >>        at org.apache.hadoop.ipc.Client.call(Client.java:1070)
>> >>        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
>> >>        at sun.proxy.$Proxy9.addBlock(Unknown Source)
>> >>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> >>        at
>> >>
>> >>
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>> >>        at
>> >>
>> >>
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> >>        at java.lang.reflect.Method.invoke(Method.java:616)
>> >>        at
>> >>
>> >>
>> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
>> >>        at
>> >>
>> >>
>> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
>> >>        at sun.proxy.$Proxy9.addBlock(Unknown Source)
>> >>        at
>> >>
>> >>
>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:3510)
>> >>        at
>> >>
>> >>
>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:3373)
>> >>        at
>> >>
>> >>
>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2600(DFSClient.java:2589)
>> >>        at
>> >>
>> >>
>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2829)
>> >>
>> >> 2014-04-05 09:55:19,233 WARN org.apache.hadoop.hdfs.DFSClient: Error
>> >> Recovery for block null bad datanode[0] nodes == null
>> >> 2014-04-05 09:55:19,233 WARN org.apache.hadoop.hdfs.DFSClient: Could
>> not
>> >> get block locations. Source file
>> >>
>> >>
>> "/hbase/usertable/237859a0b1e47c86c25a6123506ccb2a/recovered.edits/0000000000002550921.temp"
>> >> - Aborting...
>> >>
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message