hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Purtell <apurt...@apache.org>
Subject Re: Should a data node restart cause a region server to go down?
Date Tue, 07 Feb 2012 07:37:39 GMT
I'm guessing HBASE-4222 is not in that version of CDH HBase?

 
Best regards,

     - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White) 


----- Original Message -----
> From: Ted Yu <yuzhihong@gmail.com>
> To: user@hbase.apache.org
> Cc: 
> Sent: Tuesday, February 7, 2012 3:45 AM
> Subject: Re: Should a data node restart cause a region server to go down?
> 
> In your case Error Recovery wasn't successful because of:
> All datanodes 10.49.29.92:50010 are bad. Aborting...
> 
> On Mon, Feb 6, 2012 at 10:28 AM, Jeff Whiting <jeffw@qualtrics.com> wrote:
> 
>>  I was increasing the storage on some of my data nodes and thus had to do a
>>  restart of the data node.  I use cdh3u2 and ran 
> "/etc/init.d/hadoop-0.20-*
>>  *datanode restart" (I don't think this is a cdh problem). 
> Unfortunately
>>  doing the restart caused region servers to go offline.  Is this expected
>>  behavior?  It seems like should recover for those just fine without giving
>>  up and dying since there were other data nodes available.  Here are the
>>  logs on the region server from when I restarted the data node to when it
>>  decided to give up.  To give you a little background I'm running a 
> small
>>  cluster with 4 region servers and 4 data nodes.
>> 
>>  Thanks,
>>  ~Jeff
>> 
>>  12/02/06 18:06:03 WARN hdfs.DFSClient: DFSOutputStream ResponseProcessor
>>  exception  for block blk_-4249058562504578427_**18197java.io.IOException:
>>  Bad response 1 for block blk_-4249058562504578427_18197 from datanode
>>  10.49.129.134:50010
>>     at org.apache.hadoop.hdfs.**DFSClient$DFSOutputStream$**
>>  ResponseProcessor.run(**DFSClient.java:2664)
>> 
>>  12/02/06 18:06:03 INFO hdfs.DFSClient: Error Recovery for block
>>  blk_-4249058562504578427_18197 waiting for responder to exit.
>>  12/02/06 18:06:03 WARN hdfs.DFSClient: Error Recovery for block
>>  blk_-4249058562504578427_18197 bad datanode[2] 10.49.129.134:50010
>>  12/02/06 18:06:03 WARN hdfs.DFSClient: Error Recovery for block
>>  blk_-4249058562504578427_18197 in pipeline 10.59.39.142:50010,
>>  10.234.50.225:50010, 10.49.129.134:50010, 10.49.29.92:50010: bad datanode
>>  10.49.129.134:50010
>>  12/02/06 18:06:03 WARN wal.HLog: HDFS pipeline error detected. Found 3
>>  replicas but expecting 4 replicas.  Requesting close of hlog.
>>  12/02/06 18:06:03 INFO wal.SequenceFileLogWriter: Using syncFs -- HDFS-200
>>  12/02/06 18:06:03 WARN regionserver.**ReplicationSourceManager:
>>  Replication stopped, won't add new log
>>  12/02/06 18:06:03 INFO wal.HLog: Roll /hbase/.logs/ip-10-59-39-142.**
>>  eu-west-1.compute.internal,**60020,1328142685179/ip-10-59-**
>>  39-142.eu-west-1.compute.**internal%3A60020.**1328549504988,
>>  entries=3644, filesize=12276680. New hlog /hbase/.logs/ip-10-59-39-142.**
>>  eu-west-1.compute.internal,**60020,1328142685179/ip-10-59-**
>>  39-142.eu-west-1.compute.**internal%3A60020.1328551563518
>> 
>>  12/02/06 18:06:04 INFO hdfs.DFSClient: Exception in
>>  createBlockOutputStream java.io.IOException: Bad connect ack with
>>  firstBadLink as 10.49.129.134:50010
>>  12/02/06 18:06:04 INFO hdfs.DFSClient: Abandoning block
>>  blk_6156813298944908969_18211
>>  12/02/06 18:06:04 INFO hdfs.DFSClient: Excluding datanode
>>  10.49.129.134:50010
>>  12/02/06 18:06:04 WARN wal.HLog: HDFS pipeline error detected. Found 3
>>  replicas but expecting 4 replicas.  Requesting close of hlog.
>>  12/02/06 18:07:06 WARN hdfs.DFSClient: DFSOutputStream ResponseProcessor
>>  exception  for block blk_-165678744483388406_**18211java.io.IOException:
>>  Bad response 1 for block blk_-165678744483388406_18211 from datanode
>>  10.234.50.225:50010
>>     at org.apache.hadoop.hdfs.**DFSClient$DFSOutputStream$**
>>  ResponseProcessor.run(**DFSClient.java:2664)
>> 
>>  12/02/06 18:07:06 INFO hdfs.DFSClient: Error Recovery for block
>>  blk_-165678744483388406_18211 waiting for responder to exit.
>>  12/02/06 18:07:06 WARN hdfs.DFSClient: Error Recovery for block
>>  blk_-165678744483388406_18211 bad datanode[2] 10.234.50.225:50010
>>  12/02/06 18:07:06 WARN hdfs.DFSClient: Error Recovery for block
>>  blk_-165678744483388406_18211 in pipeline 10.59.39.142:50010,
>>  10.49.29.92:50010, 10.234.50.225:50010: bad datanode 10.234.50.225:50010
>>  12/02/06 18:09:21 WARN hdfs.DFSClient: DFSOutputStream ResponseProcessor
>>  exception  for block blk_-165678744483388406_**18214java.io.IOException:
>>  Connection reset by peer
>>     at sun.nio.ch.FileDispatcher.**read0(Native Method)
>>     at sun.nio.ch.SocketDispatcher.**read(SocketDispatcher.java:21)
>>     at sun.nio.ch.IOUtil.**readIntoNativeBuffer(IOUtil.**java:237)
>>     at sun.nio.ch.IOUtil.read(IOUtil.**java:210)
>>     at sun.nio.ch.SocketChannelImpl.**read(SocketChannelImpl.java:**236)
>>     at org.apache.hadoop.net.**SocketInputStream$Reader.**
>>  performIO(SocketInputStream.**java:55)
>>     at org.apache.hadoop.net.**SocketIOWithTimeout.doIO(**
>>  SocketIOWithTimeout.java:142)
>>     at org.apache.hadoop.net.**SocketInputStream.read(**
>>  SocketInputStream.java:155)
>>     at org.apache.hadoop.net.**SocketInputStream.read(**
>>  SocketInputStream.java:128)
>>     at java.io.DataInputStream.**readFully(DataInputStream.**java:178)
>>     at java.io.DataInputStream.**readLong(DataInputStream.java:**399)
>>     at org.apache.hadoop.hdfs.**protocol.DataTransferProtocol$**
>>  PipelineAck.readFields(**DataTransferProtocol.java:120)
>>     at org.apache.hadoop.hdfs.**DFSClient$DFSOutputStream$**
>>  ResponseProcessor.run(**DFSClient.java:2634)
>> 
>>  12/02/06 18:09:21 INFO hdfs.DFSClient: Error Recovery for block
>>  blk_-165678744483388406_18214 waiting for responder to exit.
>>  12/02/06 18:09:21 WARN hdfs.DFSClient: Error Recovery for block
>>  blk_-165678744483388406_18214 bad datanode[0] 10.59.39.142:50010
>>  12/02/06 18:09:21 WARN hdfs.DFSClient: Error Recovery for block
>>  blk_-165678744483388406_18214 in pipeline 10.59.39.142:50010,
>>  10.49.29.92:50010: bad datanode 10.59.39.142:50010
>>  12/02/06 18:09:55 WARN hdfs.DFSClient: DFSOutputStream ResponseProcessor
>>  exception  for block blk_-165678744483388406_**18221java.io.IOException:
>>  Connection reset by peer
>>     at sun.nio.ch.FileDispatcher.**read0(Native Method)
>>     at sun.nio.ch.SocketDispatcher.**read(SocketDispatcher.java:21)
>>     at sun.nio.ch.IOUtil.**readIntoNativeBuffer(IOUtil.**java:237)
>>     at sun.nio.ch.IOUtil.read(IOUtil.**java:210)
>>     at sun.nio.ch.SocketChannelImpl.**read(SocketChannelImpl.java:**236)
>>     at org.apache.hadoop.net.**SocketInputStream$Reader.**
>>  performIO(SocketInputStream.**java:55)
>>     at org.apache.hadoop.net.**SocketIOWithTimeout.doIO(**
>>  SocketIOWithTimeout.java:142)
>>     at org.apache.hadoop.net.**SocketInputStream.read(**
>>  SocketInputStream.java:155)
>>     at org.apache.hadoop.net.**SocketInputStream.read(**
>>  SocketInputStream.java:128)
>>     at java.io.DataInputStream.**readFully(DataInputStream.**java:178)
>>     at java.io.DataInputStream.**readLong(DataInputStream.java:**399)
>>     at org.apache.hadoop.hdfs.**protocol.DataTransferProtocol$**
>>  PipelineAck.readFields(**DataTransferProtocol.java:120)
>>     at org.apache.hadoop.hdfs.**DFSClient$DFSOutputStream$**
>>  ResponseProcessor.run(**DFSClient.java:2634)
>> 
>>  12/02/06 18:09:55 INFO hdfs.DFSClient: Error Recovery for block
>>  blk_-165678744483388406_18221 waiting for responder to exit.
>>  12/02/06 18:09:56 WARN hdfs.DFSClient: Error Recovery for block
>>  blk_-165678744483388406_18221 bad datanode[0] 10.49.29.92:50010
>>  12/02/06 18:09:56 WARN hdfs.DFSClient: Error while syncing
>>  java.io.IOException: All datanodes 10.49.29.92:50010 are bad. Aborting...
>>     at org.apache.hadoop.hdfs.**DFSClient$DFSOutputStream.**
>>  processDatanodeError(**DFSClient.java:2766)
>>     at org.apache.hadoop.hdfs.**DFSClient$DFSOutputStream.**
>>  access$1600(DFSClient.java:**2305)
>>     at org.apache.hadoop.hdfs.**DFSClient$DFSOutputStream$**
>>  DataStreamer.run(DFSClient.**java:2477)
>>  12/02/06 18:09:56 WARN hdfs.DFSClient: Error while syncing
>>  java.io.IOException: All datanodes 10.49.29.92:50010 are bad. Aborting...
>>     at org.apache.hadoop.hdfs.**DFSClient$DFSOutputStream.**
>>  processDatanodeError(**DFSClient.java:2766)
>>     at org.apache.hadoop.hdfs.**DFSClient$DFSOutputStream.**
>>  access$1600(DFSClient.java:**2305)
>>     at org.apache.hadoop.hdfs.**DFSClient$DFSOutputStream$**
>>  DataStreamer.run(DFSClient.**java:2477)
>>  12/02/06 18:09:56 FATAL wal.HLog: Could not append. Requesting close of
>>  hlog
>>  java.io.IOException: Reflection
>>     at org.apache.hadoop.hbase.**regionserver.wal.**
>>  SequenceFileLogWriter.sync(**SequenceFileLogWriter.java:**147)
>>     at org.apache.hadoop.hbase.**regionserver.wal.HLog.sync(**
>>  HLog.java:981)
>>     at org.apache.hadoop.hbase.**regionserver.wal.HLog$**
>>  LogSyncer.run(HLog.java:958)
>>  Caused by: java.lang.reflect.**InvocationTargetException
>>     at sun.reflect.**GeneratedMethodAccessor5.**invoke(Unknown Source)
>>     at sun.reflect.**DelegatingMethodAccessorImpl.**invoke(**
>>  DelegatingMethodAccessorImpl.**java:25)
>>     at java.lang.reflect.Method.**invoke(Method.java:597)
>>     at org.apache.hadoop.hbase.**regionserver.wal.**
>>  SequenceFileLogWriter.sync(**SequenceFileLogWriter.java:**145)
>>     ... 2 more
>>  Caused by: java.io.IOException: All datanodes 10.49.29.92:50010 are bad.
>>  Aborting...
>>     at org.apache.hadoop.hdfs.**DFSClient$DFSOutputStream.**
>>  processDatanodeError(**DFSClient.java:2766)
>>     at org.apache.hadoop.hdfs.**DFSClient$DFSOutputStream.**
>>  access$1600(DFSClient.java:**2305)
>>     at org.apache.hadoop.hdfs.**DFSClient$DFSOutputStream$**
>>  DataStreamer.run(DFSClient.**java:2477)
>>  12/02/06 18:09:56 ERROR wal.HLog: Error while syncing, requesting close of
>>  hlog
>>  java.io.IOException: Reflection
>>     at org.apache.hadoop.hbase.**regionserver.wal.**
>>  SequenceFileLogWriter.sync(**SequenceFileLogWriter.java:**147)
>>     at org.apache.hadoop.hbase.**regionserver.wal.HLog.sync(**
>>  HLog.java:981)
>>     at org.apache.hadoop.hbase.**regionserver.wal.HLog$**
>>  LogSyncer.run(HLog.java:958)
>>  Caused by: java.lang.reflect.**InvocationTargetException
>>     at sun.reflect.**GeneratedMethodAccessor5.**invoke(Unknown Source)
>>     at sun.reflect.**DelegatingMethodAccessorImpl.**invoke(**
>>  DelegatingMethodAccessorImpl.**java:25)
>>     at java.lang.reflect.Method.**invoke(Method.java:597)
>>     at org.apache.hadoop.hbase.**regionserver.wal.**
>>  SequenceFileLogWriter.sync(**SequenceFileLogWriter.java:**145)
>>     ... 2 more
>>  Caused by: java.io.IOException: All datanodes 10.49.29.92:50010 are bad.
>>  Aborting...
>> 
>>  --
>>  Jeff Whiting
>>  Qualtrics Senior Software Engineer
>>  jeffw@qualtrics.com
>> 
>> 
> 

Mime
View raw message