hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: Should a data node restart cause a region server to go down?
Date Mon, 06 Feb 2012 19:45:54 GMT
In your case Error Recovery wasn't successful because of:
All datanodes 10.49.29.92:50010 are bad. Aborting...

On Mon, Feb 6, 2012 at 10:28 AM, Jeff Whiting <jeffw@qualtrics.com> wrote:

> I was increasing the storage on some of my data nodes and thus had to do a
> restart of the data node.  I use cdh3u2 and ran "/etc/init.d/hadoop-0.20-*
> *datanode restart" (I don't think this is a cdh problem). Unfortunately
> doing the restart caused region servers to go offline.  Is this expected
> behavior?  It seems like should recover for those just fine without giving
> up and dying since there were other data nodes available.  Here are the
> logs on the region server from when I restarted the data node to when it
> decided to give up.  To give you a little background I'm running a small
> cluster with 4 region servers and 4 data nodes.
>
> Thanks,
> ~Jeff
>
> 12/02/06 18:06:03 WARN hdfs.DFSClient: DFSOutputStream ResponseProcessor
> exception  for block blk_-4249058562504578427_**18197java.io.IOException:
> Bad response 1 for block blk_-4249058562504578427_18197 from datanode
> 10.49.129.134:50010
>    at org.apache.hadoop.hdfs.**DFSClient$DFSOutputStream$**
> ResponseProcessor.run(**DFSClient.java:2664)
>
> 12/02/06 18:06:03 INFO hdfs.DFSClient: Error Recovery for block
> blk_-4249058562504578427_18197 waiting for responder to exit.
> 12/02/06 18:06:03 WARN hdfs.DFSClient: Error Recovery for block
> blk_-4249058562504578427_18197 bad datanode[2] 10.49.129.134:50010
> 12/02/06 18:06:03 WARN hdfs.DFSClient: Error Recovery for block
> blk_-4249058562504578427_18197 in pipeline 10.59.39.142:50010,
> 10.234.50.225:50010, 10.49.129.134:50010, 10.49.29.92:50010: bad datanode
> 10.49.129.134:50010
> 12/02/06 18:06:03 WARN wal.HLog: HDFS pipeline error detected. Found 3
> replicas but expecting 4 replicas.  Requesting close of hlog.
> 12/02/06 18:06:03 INFO wal.SequenceFileLogWriter: Using syncFs -- HDFS-200
> 12/02/06 18:06:03 WARN regionserver.**ReplicationSourceManager:
> Replication stopped, won't add new log
> 12/02/06 18:06:03 INFO wal.HLog: Roll /hbase/.logs/ip-10-59-39-142.**
> eu-west-1.compute.internal,**60020,1328142685179/ip-10-59-**
> 39-142.eu-west-1.compute.**internal%3A60020.**1328549504988,
> entries=3644, filesize=12276680. New hlog /hbase/.logs/ip-10-59-39-142.**
> eu-west-1.compute.internal,**60020,1328142685179/ip-10-59-**
> 39-142.eu-west-1.compute.**internal%3A60020.1328551563518
>
> 12/02/06 18:06:04 INFO hdfs.DFSClient: Exception in
> createBlockOutputStream java.io.IOException: Bad connect ack with
> firstBadLink as 10.49.129.134:50010
> 12/02/06 18:06:04 INFO hdfs.DFSClient: Abandoning block
> blk_6156813298944908969_18211
> 12/02/06 18:06:04 INFO hdfs.DFSClient: Excluding datanode
> 10.49.129.134:50010
> 12/02/06 18:06:04 WARN wal.HLog: HDFS pipeline error detected. Found 3
> replicas but expecting 4 replicas.  Requesting close of hlog.
> 12/02/06 18:07:06 WARN hdfs.DFSClient: DFSOutputStream ResponseProcessor
> exception  for block blk_-165678744483388406_**18211java.io.IOException:
> Bad response 1 for block blk_-165678744483388406_18211 from datanode
> 10.234.50.225:50010
>    at org.apache.hadoop.hdfs.**DFSClient$DFSOutputStream$**
> ResponseProcessor.run(**DFSClient.java:2664)
>
> 12/02/06 18:07:06 INFO hdfs.DFSClient: Error Recovery for block
> blk_-165678744483388406_18211 waiting for responder to exit.
> 12/02/06 18:07:06 WARN hdfs.DFSClient: Error Recovery for block
> blk_-165678744483388406_18211 bad datanode[2] 10.234.50.225:50010
> 12/02/06 18:07:06 WARN hdfs.DFSClient: Error Recovery for block
> blk_-165678744483388406_18211 in pipeline 10.59.39.142:50010,
> 10.49.29.92:50010, 10.234.50.225:50010: bad datanode 10.234.50.225:50010
> 12/02/06 18:09:21 WARN hdfs.DFSClient: DFSOutputStream ResponseProcessor
> exception  for block blk_-165678744483388406_**18214java.io.IOException:
> Connection reset by peer
>    at sun.nio.ch.FileDispatcher.**read0(Native Method)
>    at sun.nio.ch.SocketDispatcher.**read(SocketDispatcher.java:21)
>    at sun.nio.ch.IOUtil.**readIntoNativeBuffer(IOUtil.**java:237)
>    at sun.nio.ch.IOUtil.read(IOUtil.**java:210)
>    at sun.nio.ch.SocketChannelImpl.**read(SocketChannelImpl.java:**236)
>    at org.apache.hadoop.net.**SocketInputStream$Reader.**
> performIO(SocketInputStream.**java:55)
>    at org.apache.hadoop.net.**SocketIOWithTimeout.doIO(**
> SocketIOWithTimeout.java:142)
>    at org.apache.hadoop.net.**SocketInputStream.read(**
> SocketInputStream.java:155)
>    at org.apache.hadoop.net.**SocketInputStream.read(**
> SocketInputStream.java:128)
>    at java.io.DataInputStream.**readFully(DataInputStream.**java:178)
>    at java.io.DataInputStream.**readLong(DataInputStream.java:**399)
>    at org.apache.hadoop.hdfs.**protocol.DataTransferProtocol$**
> PipelineAck.readFields(**DataTransferProtocol.java:120)
>    at org.apache.hadoop.hdfs.**DFSClient$DFSOutputStream$**
> ResponseProcessor.run(**DFSClient.java:2634)
>
> 12/02/06 18:09:21 INFO hdfs.DFSClient: Error Recovery for block
> blk_-165678744483388406_18214 waiting for responder to exit.
> 12/02/06 18:09:21 WARN hdfs.DFSClient: Error Recovery for block
> blk_-165678744483388406_18214 bad datanode[0] 10.59.39.142:50010
> 12/02/06 18:09:21 WARN hdfs.DFSClient: Error Recovery for block
> blk_-165678744483388406_18214 in pipeline 10.59.39.142:50010,
> 10.49.29.92:50010: bad datanode 10.59.39.142:50010
> 12/02/06 18:09:55 WARN hdfs.DFSClient: DFSOutputStream ResponseProcessor
> exception  for block blk_-165678744483388406_**18221java.io.IOException:
> Connection reset by peer
>    at sun.nio.ch.FileDispatcher.**read0(Native Method)
>    at sun.nio.ch.SocketDispatcher.**read(SocketDispatcher.java:21)
>    at sun.nio.ch.IOUtil.**readIntoNativeBuffer(IOUtil.**java:237)
>    at sun.nio.ch.IOUtil.read(IOUtil.**java:210)
>    at sun.nio.ch.SocketChannelImpl.**read(SocketChannelImpl.java:**236)
>    at org.apache.hadoop.net.**SocketInputStream$Reader.**
> performIO(SocketInputStream.**java:55)
>    at org.apache.hadoop.net.**SocketIOWithTimeout.doIO(**
> SocketIOWithTimeout.java:142)
>    at org.apache.hadoop.net.**SocketInputStream.read(**
> SocketInputStream.java:155)
>    at org.apache.hadoop.net.**SocketInputStream.read(**
> SocketInputStream.java:128)
>    at java.io.DataInputStream.**readFully(DataInputStream.**java:178)
>    at java.io.DataInputStream.**readLong(DataInputStream.java:**399)
>    at org.apache.hadoop.hdfs.**protocol.DataTransferProtocol$**
> PipelineAck.readFields(**DataTransferProtocol.java:120)
>    at org.apache.hadoop.hdfs.**DFSClient$DFSOutputStream$**
> ResponseProcessor.run(**DFSClient.java:2634)
>
> 12/02/06 18:09:55 INFO hdfs.DFSClient: Error Recovery for block
> blk_-165678744483388406_18221 waiting for responder to exit.
> 12/02/06 18:09:56 WARN hdfs.DFSClient: Error Recovery for block
> blk_-165678744483388406_18221 bad datanode[0] 10.49.29.92:50010
> 12/02/06 18:09:56 WARN hdfs.DFSClient: Error while syncing
> java.io.IOException: All datanodes 10.49.29.92:50010 are bad. Aborting...
>    at org.apache.hadoop.hdfs.**DFSClient$DFSOutputStream.**
> processDatanodeError(**DFSClient.java:2766)
>    at org.apache.hadoop.hdfs.**DFSClient$DFSOutputStream.**
> access$1600(DFSClient.java:**2305)
>    at org.apache.hadoop.hdfs.**DFSClient$DFSOutputStream$**
> DataStreamer.run(DFSClient.**java:2477)
> 12/02/06 18:09:56 WARN hdfs.DFSClient: Error while syncing
> java.io.IOException: All datanodes 10.49.29.92:50010 are bad. Aborting...
>    at org.apache.hadoop.hdfs.**DFSClient$DFSOutputStream.**
> processDatanodeError(**DFSClient.java:2766)
>    at org.apache.hadoop.hdfs.**DFSClient$DFSOutputStream.**
> access$1600(DFSClient.java:**2305)
>    at org.apache.hadoop.hdfs.**DFSClient$DFSOutputStream$**
> DataStreamer.run(DFSClient.**java:2477)
> 12/02/06 18:09:56 FATAL wal.HLog: Could not append. Requesting close of
> hlog
> java.io.IOException: Reflection
>    at org.apache.hadoop.hbase.**regionserver.wal.**
> SequenceFileLogWriter.sync(**SequenceFileLogWriter.java:**147)
>    at org.apache.hadoop.hbase.**regionserver.wal.HLog.sync(**
> HLog.java:981)
>    at org.apache.hadoop.hbase.**regionserver.wal.HLog$**
> LogSyncer.run(HLog.java:958)
> Caused by: java.lang.reflect.**InvocationTargetException
>    at sun.reflect.**GeneratedMethodAccessor5.**invoke(Unknown Source)
>    at sun.reflect.**DelegatingMethodAccessorImpl.**invoke(**
> DelegatingMethodAccessorImpl.**java:25)
>    at java.lang.reflect.Method.**invoke(Method.java:597)
>    at org.apache.hadoop.hbase.**regionserver.wal.**
> SequenceFileLogWriter.sync(**SequenceFileLogWriter.java:**145)
>    ... 2 more
> Caused by: java.io.IOException: All datanodes 10.49.29.92:50010 are bad.
> Aborting...
>    at org.apache.hadoop.hdfs.**DFSClient$DFSOutputStream.**
> processDatanodeError(**DFSClient.java:2766)
>    at org.apache.hadoop.hdfs.**DFSClient$DFSOutputStream.**
> access$1600(DFSClient.java:**2305)
>    at org.apache.hadoop.hdfs.**DFSClient$DFSOutputStream$**
> DataStreamer.run(DFSClient.**java:2477)
> 12/02/06 18:09:56 ERROR wal.HLog: Error while syncing, requesting close of
> hlog
> java.io.IOException: Reflection
>    at org.apache.hadoop.hbase.**regionserver.wal.**
> SequenceFileLogWriter.sync(**SequenceFileLogWriter.java:**147)
>    at org.apache.hadoop.hbase.**regionserver.wal.HLog.sync(**
> HLog.java:981)
>    at org.apache.hadoop.hbase.**regionserver.wal.HLog$**
> LogSyncer.run(HLog.java:958)
> Caused by: java.lang.reflect.**InvocationTargetException
>    at sun.reflect.**GeneratedMethodAccessor5.**invoke(Unknown Source)
>    at sun.reflect.**DelegatingMethodAccessorImpl.**invoke(**
> DelegatingMethodAccessorImpl.**java:25)
>    at java.lang.reflect.Method.**invoke(Method.java:597)
>    at org.apache.hadoop.hbase.**regionserver.wal.**
> SequenceFileLogWriter.sync(**SequenceFileLogWriter.java:**145)
>    ... 2 more
> Caused by: java.io.IOException: All datanodes 10.49.29.92:50010 are bad.
> Aborting...
>
> --
> Jeff Whiting
> Qualtrics Senior Software Engineer
> jeffw@qualtrics.com
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message