hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Whiting <je...@qualtrics.com>
Subject Should a data node restart cause a region server to go down?
Date Mon, 06 Feb 2012 18:28:04 GMT
I was increasing the storage on some of my data nodes and thus had to do a restart of the data

node.  I use cdh3u2 and ran "/etc/init.d/hadoop-0.20-datanode restart" (I don't think this
is a cdh 
problem). Unfortunately doing the restart caused region servers to go offline.  Is this expected

behavior?  It seems like should recover for those just fine without giving up and dying since
there 
were other data nodes available.  Here are the logs on the region server from when I restarted
the 
data node to when it decided to give up.  To give you a little background I'm running a small

cluster with 4 region servers and 4 data nodes.

Thanks,
~Jeff

12/02/06 18:06:03 WARN hdfs.DFSClient: DFSOutputStream ResponseProcessor exception  for block

blk_-4249058562504578427_18197java.io.IOException: Bad response 1 for block 
blk_-4249058562504578427_18197 from datanode 10.49.129.134:50010
     at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:2664)

12/02/06 18:06:03 INFO hdfs.DFSClient: Error Recovery for block blk_-4249058562504578427_18197

waiting for responder to exit.
12/02/06 18:06:03 WARN hdfs.DFSClient: Error Recovery for block blk_-4249058562504578427_18197
bad 
datanode[2] 10.49.129.134:50010
12/02/06 18:06:03 WARN hdfs.DFSClient: Error Recovery for block blk_-4249058562504578427_18197
in 
pipeline 10.59.39.142:50010, 10.234.50.225:50010, 10.49.129.134:50010, 10.49.29.92:50010:
bad 
datanode 10.49.129.134:50010
12/02/06 18:06:03 WARN wal.HLog: HDFS pipeline error detected. Found 3 replicas but expecting
4 
replicas.  Requesting close of hlog.
12/02/06 18:06:03 INFO wal.SequenceFileLogWriter: Using syncFs -- HDFS-200
12/02/06 18:06:03 WARN regionserver.ReplicationSourceManager: Replication stopped, won't add
new log
12/02/06 18:06:03 INFO wal.HLog: Roll 
/hbase/.logs/ip-10-59-39-142.eu-west-1.compute.internal,60020,1328142685179/ip-10-59-39-142.eu-west-1.compute.internal%3A60020.1328549504988,

entries=3644, filesize=12276680. New hlog 
/hbase/.logs/ip-10-59-39-142.eu-west-1.compute.internal,60020,1328142685179/ip-10-59-39-142.eu-west-1.compute.internal%3A60020.1328551563518

12/02/06 18:06:04 INFO hdfs.DFSClient: Exception in createBlockOutputStream java.io.IOException:
Bad 
connect ack with firstBadLink as 10.49.129.134:50010
12/02/06 18:06:04 INFO hdfs.DFSClient: Abandoning block blk_6156813298944908969_18211
12/02/06 18:06:04 INFO hdfs.DFSClient: Excluding datanode 10.49.129.134:50010
12/02/06 18:06:04 WARN wal.HLog: HDFS pipeline error detected. Found 3 replicas but expecting
4 
replicas.  Requesting close of hlog.
12/02/06 18:07:06 WARN hdfs.DFSClient: DFSOutputStream ResponseProcessor exception  for block

blk_-165678744483388406_18211java.io.IOException: Bad response 1 for block 
blk_-165678744483388406_18211 from datanode 10.234.50.225:50010
     at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:2664)

12/02/06 18:07:06 INFO hdfs.DFSClient: Error Recovery for block blk_-165678744483388406_18211

waiting for responder to exit.
12/02/06 18:07:06 WARN hdfs.DFSClient: Error Recovery for block blk_-165678744483388406_18211
bad 
datanode[2] 10.234.50.225:50010
12/02/06 18:07:06 WARN hdfs.DFSClient: Error Recovery for block blk_-165678744483388406_18211
in 
pipeline 10.59.39.142:50010, 10.49.29.92:50010, 10.234.50.225:50010: bad datanode 10.234.50.225:50010
12/02/06 18:09:21 WARN hdfs.DFSClient: DFSOutputStream ResponseProcessor exception  for block

blk_-165678744483388406_18214java.io.IOException: Connection reset by peer
     at sun.nio.ch.FileDispatcher.read0(Native Method)
     at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
     at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:237)
     at sun.nio.ch.IOUtil.read(IOUtil.java:210)
     at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:236)
     at org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:55)
     at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
     at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
     at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
     at java.io.DataInputStream.readFully(DataInputStream.java:178)
     at java.io.DataInputStream.readLong(DataInputStream.java:399)
     at 
org.apache.hadoop.hdfs.protocol.DataTransferProtocol$PipelineAck.readFields(DataTransferProtocol.java:120)
     at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:2634)

12/02/06 18:09:21 INFO hdfs.DFSClient: Error Recovery for block blk_-165678744483388406_18214

waiting for responder to exit.
12/02/06 18:09:21 WARN hdfs.DFSClient: Error Recovery for block blk_-165678744483388406_18214
bad 
datanode[0] 10.59.39.142:50010
12/02/06 18:09:21 WARN hdfs.DFSClient: Error Recovery for block blk_-165678744483388406_18214
in 
pipeline 10.59.39.142:50010, 10.49.29.92:50010: bad datanode 10.59.39.142:50010
12/02/06 18:09:55 WARN hdfs.DFSClient: DFSOutputStream ResponseProcessor exception  for block

blk_-165678744483388406_18221java.io.IOException: Connection reset by peer
     at sun.nio.ch.FileDispatcher.read0(Native Method)
     at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
     at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:237)
     at sun.nio.ch.IOUtil.read(IOUtil.java:210)
     at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:236)
     at org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:55)
     at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
     at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
     at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
     at java.io.DataInputStream.readFully(DataInputStream.java:178)
     at java.io.DataInputStream.readLong(DataInputStream.java:399)
     at 
org.apache.hadoop.hdfs.protocol.DataTransferProtocol$PipelineAck.readFields(DataTransferProtocol.java:120)
     at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:2634)

12/02/06 18:09:55 INFO hdfs.DFSClient: Error Recovery for block blk_-165678744483388406_18221

waiting for responder to exit.
12/02/06 18:09:56 WARN hdfs.DFSClient: Error Recovery for block blk_-165678744483388406_18221
bad 
datanode[0] 10.49.29.92:50010
12/02/06 18:09:56 WARN hdfs.DFSClient: Error while syncing
java.io.IOException: All datanodes 10.49.29.92:50010 are bad. Aborting...
     at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2766)
     at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1600(DFSClient.java:2305)
     at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2477)
12/02/06 18:09:56 WARN hdfs.DFSClient: Error while syncing
java.io.IOException: All datanodes 10.49.29.92:50010 are bad. Aborting...
     at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2766)
     at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1600(DFSClient.java:2305)
     at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2477)
12/02/06 18:09:56 FATAL wal.HLog: Could not append. Requesting close of hlog
java.io.IOException: Reflection
     at 
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.sync(SequenceFileLogWriter.java:147)
     at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:981)
     at org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.run(HLog.java:958)
Caused by: java.lang.reflect.InvocationTargetException
     at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
     at java.lang.reflect.Method.invoke(Method.java:597)
     at 
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.sync(SequenceFileLogWriter.java:145)
     ... 2 more
Caused by: java.io.IOException: All datanodes 10.49.29.92:50010 are bad. Aborting...
     at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2766)
     at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1600(DFSClient.java:2305)
     at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2477)
12/02/06 18:09:56 ERROR wal.HLog: Error while syncing, requesting close of hlog
java.io.IOException: Reflection
     at 
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.sync(SequenceFileLogWriter.java:147)
     at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:981)
     at org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.run(HLog.java:958)
Caused by: java.lang.reflect.InvocationTargetException
     at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
     at java.lang.reflect.Method.invoke(Method.java:597)
     at 
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.sync(SequenceFileLogWriter.java:145)
     ... 2 more
Caused by: java.io.IOException: All datanodes 10.49.29.92:50010 are bad. Aborting...

-- 
Jeff Whiting
Qualtrics Senior Software Engineer
jeffw@qualtrics.com


Mime
View raw message