hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Whiting <je...@qualtrics.com>
Subject Re: Should a data node restart cause a region server to go down?
Date Mon, 06 Feb 2012 20:08:33 GMT
What would "hadoop fsck /" that type of problem if there really were no nodes with that data?
 The 
worst I've seen is:  Target Replicas is 4 but found 3 replica(s).

~Jeff

On 2/6/2012 12:45 PM, Ted Yu wrote:
> In your case Error Recovery wasn't successful because of:
> All datanodes 10.49.29.92:50010 are bad. Aborting...
>
> On Mon, Feb 6, 2012 at 10:28 AM, Jeff Whiting<jeffw@qualtrics.com>  wrote:
>
>> I was increasing the storage on some of my data nodes and thus had to do a
>> restart of the data node.  I use cdh3u2 and ran "/etc/init.d/hadoop-0.20-*
>> *datanode restart" (I don't think this is a cdh problem). Unfortunately
>> doing the restart caused region servers to go offline.  Is this expected
>> behavior?  It seems like should recover for those just fine without giving
>> up and dying since there were other data nodes available.  Here are the
>> logs on the region server from when I restarted the data node to when it
>> decided to give up.  To give you a little background I'm running a small
>> cluster with 4 region servers and 4 data nodes.
>>
>> Thanks,
>> ~Jeff
>>
>> 12/02/06 18:06:03 WARN hdfs.DFSClient: DFSOutputStream ResponseProcessor
>> exception  for block blk_-4249058562504578427_**18197java.io.IOException:
>> Bad response 1 for block blk_-4249058562504578427_18197 from datanode
>> 10.49.129.134:50010
>>     at org.apache.hadoop.hdfs.**DFSClient$DFSOutputStream$**
>> ResponseProcessor.run(**DFSClient.java:2664)
>>
>> 12/02/06 18:06:03 INFO hdfs.DFSClient: Error Recovery for block
>> blk_-4249058562504578427_18197 waiting for responder to exit.
>> 12/02/06 18:06:03 WARN hdfs.DFSClient: Error Recovery for block
>> blk_-4249058562504578427_18197 bad datanode[2] 10.49.129.134:50010
>> 12/02/06 18:06:03 WARN hdfs.DFSClient: Error Recovery for block
>> blk_-4249058562504578427_18197 in pipeline 10.59.39.142:50010,
>> 10.234.50.225:50010, 10.49.129.134:50010, 10.49.29.92:50010: bad datanode
>> 10.49.129.134:50010
>> 12/02/06 18:06:03 WARN wal.HLog: HDFS pipeline error detected. Found 3
>> replicas but expecting 4 replicas.  Requesting close of hlog.
>> 12/02/06 18:06:03 INFO wal.SequenceFileLogWriter: Using syncFs -- HDFS-200
>> 12/02/06 18:06:03 WARN regionserver.**ReplicationSourceManager:
>> Replication stopped, won't add new log
>> 12/02/06 18:06:03 INFO wal.HLog: Roll /hbase/.logs/ip-10-59-39-142.**
>> eu-west-1.compute.internal,**60020,1328142685179/ip-10-59-**
>> 39-142.eu-west-1.compute.**internal%3A60020.**1328549504988,
>> entries=3644, filesize=12276680. New hlog /hbase/.logs/ip-10-59-39-142.**
>> eu-west-1.compute.internal,**60020,1328142685179/ip-10-59-**
>> 39-142.eu-west-1.compute.**internal%3A60020.1328551563518
>>
>> 12/02/06 18:06:04 INFO hdfs.DFSClient: Exception in
>> createBlockOutputStream java.io.IOException: Bad connect ack with
>> firstBadLink as 10.49.129.134:50010
>> 12/02/06 18:06:04 INFO hdfs.DFSClient: Abandoning block
>> blk_6156813298944908969_18211
>> 12/02/06 18:06:04 INFO hdfs.DFSClient: Excluding datanode
>> 10.49.129.134:50010
>> 12/02/06 18:06:04 WARN wal.HLog: HDFS pipeline error detected. Found 3
>> replicas but expecting 4 replicas.  Requesting close of hlog.
>> 12/02/06 18:07:06 WARN hdfs.DFSClient: DFSOutputStream ResponseProcessor
>> exception  for block blk_-165678744483388406_**18211java.io.IOException:
>> Bad response 1 for block blk_-165678744483388406_18211 from datanode
>> 10.234.50.225:50010
>>     at org.apache.hadoop.hdfs.**DFSClient$DFSOutputStream$**
>> ResponseProcessor.run(**DFSClient.java:2664)
>>
>> 12/02/06 18:07:06 INFO hdfs.DFSClient: Error Recovery for block
>> blk_-165678744483388406_18211 waiting for responder to exit.
>> 12/02/06 18:07:06 WARN hdfs.DFSClient: Error Recovery for block
>> blk_-165678744483388406_18211 bad datanode[2] 10.234.50.225:50010
>> 12/02/06 18:07:06 WARN hdfs.DFSClient: Error Recovery for block
>> blk_-165678744483388406_18211 in pipeline 10.59.39.142:50010,
>> 10.49.29.92:50010, 10.234.50.225:50010: bad datanode 10.234.50.225:50010
>> 12/02/06 18:09:21 WARN hdfs.DFSClient: DFSOutputStream ResponseProcessor
>> exception  for block blk_-165678744483388406_**18214java.io.IOException:
>> Connection reset by peer
>>     at sun.nio.ch.FileDispatcher.**read0(Native Method)
>>     at sun.nio.ch.SocketDispatcher.**read(SocketDispatcher.java:21)
>>     at sun.nio.ch.IOUtil.**readIntoNativeBuffer(IOUtil.**java:237)
>>     at sun.nio.ch.IOUtil.read(IOUtil.**java:210)
>>     at sun.nio.ch.SocketChannelImpl.**read(SocketChannelImpl.java:**236)
>>     at org.apache.hadoop.net.**SocketInputStream$Reader.**
>> performIO(SocketInputStream.**java:55)
>>     at org.apache.hadoop.net.**SocketIOWithTimeout.doIO(**
>> SocketIOWithTimeout.java:142)
>>     at org.apache.hadoop.net.**SocketInputStream.read(**
>> SocketInputStream.java:155)
>>     at org.apache.hadoop.net.**SocketInputStream.read(**
>> SocketInputStream.java:128)
>>     at java.io.DataInputStream.**readFully(DataInputStream.**java:178)
>>     at java.io.DataInputStream.**readLong(DataInputStream.java:**399)
>>     at org.apache.hadoop.hdfs.**protocol.DataTransferProtocol$**
>> PipelineAck.readFields(**DataTransferProtocol.java:120)
>>     at org.apache.hadoop.hdfs.**DFSClient$DFSOutputStream$**
>> ResponseProcessor.run(**DFSClient.java:2634)
>>
>> 12/02/06 18:09:21 INFO hdfs.DFSClient: Error Recovery for block
>> blk_-165678744483388406_18214 waiting for responder to exit.
>> 12/02/06 18:09:21 WARN hdfs.DFSClient: Error Recovery for block
>> blk_-165678744483388406_18214 bad datanode[0] 10.59.39.142:50010
>> 12/02/06 18:09:21 WARN hdfs.DFSClient: Error Recovery for block
>> blk_-165678744483388406_18214 in pipeline 10.59.39.142:50010,
>> 10.49.29.92:50010: bad datanode 10.59.39.142:50010
>> 12/02/06 18:09:55 WARN hdfs.DFSClient: DFSOutputStream ResponseProcessor
>> exception  for block blk_-165678744483388406_**18221java.io.IOException:
>> Connection reset by peer
>>     at sun.nio.ch.FileDispatcher.**read0(Native Method)
>>     at sun.nio.ch.SocketDispatcher.**read(SocketDispatcher.java:21)
>>     at sun.nio.ch.IOUtil.**readIntoNativeBuffer(IOUtil.**java:237)
>>     at sun.nio.ch.IOUtil.read(IOUtil.**java:210)
>>     at sun.nio.ch.SocketChannelImpl.**read(SocketChannelImpl.java:**236)
>>     at org.apache.hadoop.net.**SocketInputStream$Reader.**
>> performIO(SocketInputStream.**java:55)
>>     at org.apache.hadoop.net.**SocketIOWithTimeout.doIO(**
>> SocketIOWithTimeout.java:142)
>>     at org.apache.hadoop.net.**SocketInputStream.read(**
>> SocketInputStream.java:155)
>>     at org.apache.hadoop.net.**SocketInputStream.read(**
>> SocketInputStream.java:128)
>>     at java.io.DataInputStream.**readFully(DataInputStream.**java:178)
>>     at java.io.DataInputStream.**readLong(DataInputStream.java:**399)
>>     at org.apache.hadoop.hdfs.**protocol.DataTransferProtocol$**
>> PipelineAck.readFields(**DataTransferProtocol.java:120)
>>     at org.apache.hadoop.hdfs.**DFSClient$DFSOutputStream$**
>> ResponseProcessor.run(**DFSClient.java:2634)
>>
>> 12/02/06 18:09:55 INFO hdfs.DFSClient: Error Recovery for block
>> blk_-165678744483388406_18221 waiting for responder to exit.
>> 12/02/06 18:09:56 WARN hdfs.DFSClient: Error Recovery for block
>> blk_-165678744483388406_18221 bad datanode[0] 10.49.29.92:50010
>> 12/02/06 18:09:56 WARN hdfs.DFSClient: Error while syncing
>> java.io.IOException: All datanodes 10.49.29.92:50010 are bad. Aborting...
>>     at org.apache.hadoop.hdfs.**DFSClient$DFSOutputStream.**
>> processDatanodeError(**DFSClient.java:2766)
>>     at org.apache.hadoop.hdfs.**DFSClient$DFSOutputStream.**
>> access$1600(DFSClient.java:**2305)
>>     at org.apache.hadoop.hdfs.**DFSClient$DFSOutputStream$**
>> DataStreamer.run(DFSClient.**java:2477)
>> 12/02/06 18:09:56 WARN hdfs.DFSClient: Error while syncing
>> java.io.IOException: All datanodes 10.49.29.92:50010 are bad. Aborting...
>>     at org.apache.hadoop.hdfs.**DFSClient$DFSOutputStream.**
>> processDatanodeError(**DFSClient.java:2766)
>>     at org.apache.hadoop.hdfs.**DFSClient$DFSOutputStream.**
>> access$1600(DFSClient.java:**2305)
>>     at org.apache.hadoop.hdfs.**DFSClient$DFSOutputStream$**
>> DataStreamer.run(DFSClient.**java:2477)
>> 12/02/06 18:09:56 FATAL wal.HLog: Could not append. Requesting close of
>> hlog
>> java.io.IOException: Reflection
>>     at org.apache.hadoop.hbase.**regionserver.wal.**
>> SequenceFileLogWriter.sync(**SequenceFileLogWriter.java:**147)
>>     at org.apache.hadoop.hbase.**regionserver.wal.HLog.sync(**
>> HLog.java:981)
>>     at org.apache.hadoop.hbase.**regionserver.wal.HLog$**
>> LogSyncer.run(HLog.java:958)
>> Caused by: java.lang.reflect.**InvocationTargetException
>>     at sun.reflect.**GeneratedMethodAccessor5.**invoke(Unknown Source)
>>     at sun.reflect.**DelegatingMethodAccessorImpl.**invoke(**
>> DelegatingMethodAccessorImpl.**java:25)
>>     at java.lang.reflect.Method.**invoke(Method.java:597)
>>     at org.apache.hadoop.hbase.**regionserver.wal.**
>> SequenceFileLogWriter.sync(**SequenceFileLogWriter.java:**145)
>>     ... 2 more
>> Caused by: java.io.IOException: All datanodes 10.49.29.92:50010 are bad.
>> Aborting...
>>     at org.apache.hadoop.hdfs.**DFSClient$DFSOutputStream.**
>> processDatanodeError(**DFSClient.java:2766)
>>     at org.apache.hadoop.hdfs.**DFSClient$DFSOutputStream.**
>> access$1600(DFSClient.java:**2305)
>>     at org.apache.hadoop.hdfs.**DFSClient$DFSOutputStream$**
>> DataStreamer.run(DFSClient.**java:2477)
>> 12/02/06 18:09:56 ERROR wal.HLog: Error while syncing, requesting close of
>> hlog
>> java.io.IOException: Reflection
>>     at org.apache.hadoop.hbase.**regionserver.wal.**
>> SequenceFileLogWriter.sync(**SequenceFileLogWriter.java:**147)
>>     at org.apache.hadoop.hbase.**regionserver.wal.HLog.sync(**
>> HLog.java:981)
>>     at org.apache.hadoop.hbase.**regionserver.wal.HLog$**
>> LogSyncer.run(HLog.java:958)
>> Caused by: java.lang.reflect.**InvocationTargetException
>>     at sun.reflect.**GeneratedMethodAccessor5.**invoke(Unknown Source)
>>     at sun.reflect.**DelegatingMethodAccessorImpl.**invoke(**
>> DelegatingMethodAccessorImpl.**java:25)
>>     at java.lang.reflect.Method.**invoke(Method.java:597)
>>     at org.apache.hadoop.hbase.**regionserver.wal.**
>> SequenceFileLogWriter.sync(**SequenceFileLogWriter.java:**145)
>>     ... 2 more
>> Caused by: java.io.IOException: All datanodes 10.49.29.92:50010 are bad.
>> Aborting...
>>
>> --
>> Jeff Whiting
>> Qualtrics Senior Software Engineer
>> jeffw@qualtrics.com
>>
>>

-- 
Jeff Whiting
Qualtrics Senior Software Engineer
jeffw@qualtrics.com


Mime
View raw message