hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ian Brooks <i.bro...@sensewhere.com>
Subject Re: regionserver crash after node decomission
Date Thu, 05 Jun 2014 11:33:30 GMT
Hi,

The logs around that time foe .112 are

2
2014-06-04 13:55:08,797 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace:
src: /10.143.38.112:50010, dest: /10.143.38.105:34986, bytes: 131072, op: HDFS_READ, cliID:
DFSClient_hb_rs_sw-hadoop-004,16020,1401718647248_343372106_28, offset: 0, srvID: 112a7edd-e879-4497-bc3b-0e4d5b88f45b,
blockid: BP-2121456822-10.143.38.149-1396953188241:blk_1073994110_253327, duration: 84803151
2014-06-04 13:55:53,254 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace:
src: /10.143.38.112:50010, dest: /10.143.38.112:32763, bytes: 5794, op: HDFS_READ, cliID:
DFSClient_hb_rs_sw-hadoop-005,16020,1401885988003_-537176952_28, offset: 0, srvID: 112a7edd-e879-4497-bc3b-0e4d5b88f45b,
blockid: BP-2121456822-10.143.38.149-1396953188241:blk_1074085666_344919, duration: 372334
2014-06-04 13:56:02,891 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace:
src: /10.143.38.112:50010, dest: /10.143.38.112:32764, bytes: 5794, op: HDFS_READ, cliID:
DFSClient_hb_rs_sw-hadoop-005,16020,1401885988003_-537176952_28, offset: 0, srvID: 112a7edd-e879-4497-bc3b-0e4d5b88f45b,
blockid: BP-2121456822-10.143.38.149-1396953188241:blk_1074085666_344919, duration: 429859
2014-06-04 13:57:09,690 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving BP-2121456822-10.143.38.149-1396953188241:blk_1074085687_344940
src: /10.143.38.112:32770 dest: /10.143.38.112:50010
2014-06-04 14:00:11,213 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving BP-2121456822-10.143.38.149-1396953188241:blk_1074085689_344942
src: /10.143.38.104:45623 dest: /10.143.38.112:50010
2014-06-04 14:00:15,438 INFO org.apache.hadoop.util.JvmPauseMonitor: Detected pause in JVM
or host machine (eg GC): pause of approximately 1223ms
No GCs detected
2014-06-04 14:00:22,136 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving BP-2121456822-10.143.38.149-1396953188241:blk_1074085691_344944
src: /10.143.38.149:30394 dest: /10.143.38.112:50010
2014-06-04 14:00:22,152 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DataNode{data=FSDataset{dirpath='[/home/hadoop/hdfs/datanode/current]'},
localName='sw-hadoop-005:50010', datanodeUuid='112a7edd-e879-4497-bc3b-0e4d5b88f45b', xmitsInProgress=0}:Exception
transfering block BP-2121456822-10.143.38.149-1396953188241:blk_1074085691_344944 to mirror
10.143.38.105:50010: java.net.ConnectException: Connection refused
2014-06-04 14:00:22,152 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: opWriteBlock
BP-2121456822-10.143.38.149-1396953188241:blk_1074085691_344944 received exception java.net.ConnectException:
Connection refused
2014-06-04 14:00:22,152 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: sw-hadoop-005:50010:DataXceiver
error processing WRITE_BLOCK operation  src: /10.143.38.149:30394 dest: /10.143.38.112:50010
java.net.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
        at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
        at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:529)
        at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:493)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:484)
        at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:115)
        at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:68)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221)
        at java.lang.Thread.run(Thread.java:744)
2014-06-04 14:00:22,365 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving BP-2121456822-10.143.38.149-1396953188241:blk_1074085694_344947
src: /10.143.38.115:20551 dest: /10.143.38.112:50010
2014-06-04 14:00:22,368 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving BP-2121456822-10.143.38.149-1396953188241:blk_1074085697_344950
src: /10.143.38.104:45654 dest: /10.143.38.112:50010


The failure at 14:00 could be related, though it is 3 seconds later and all hosts are time
syncronised. There is nothing in the logs on the hosts to suggest the clock was adjusted by
any amount. 


-Ian Brooks

On Wed, Jun 4, 2014 at 7:08 AM, Ian Brooks <i.brooks@sensewhere.com> wrote:

> Hi
>
> Well i performed the procedure on another 4 nodes today and no node fell
> over from it. so perhaps I was just very unlucky with the 2 previous
> attempts.



There is an issue in DFSClient. Need to take a look...



> When I shut down the datanodes I saw errors int he logs, but they are to
> be expected and the servers just continued as normal after that. The only
> slight oddity was the datanode in question for the log below was
> 10.143.38.105 so I'm not sure why it was complaining about 10.143.38.112
> and 10.143.38.116 as they were both running and healthy.
>
>
>
Anything in logs of 10.143.38.112 around this time?

The ERROR message is composed by catch on DataXceiver#run. It has
10.143.38.116 as the localAddress and .112 as remote (it closed the
connection, the 'Connection reset by peer'?)

St.Ack



>
> 014-06-04 13:57:07,641 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder:
> BP-2121456822-10.143.38.149-1396953188241:blk_1074085660_344913,
> type=HAS_DOWNSTREAM_IN_PIPELINE
> java.io.EOFException: Premature EOF: no length prefix available
>         at
> org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:1883)
>         at
> org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:116)
>         at
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java:958)
>         at java.lang.Thread.run(Thread.java:744)
> 2014-06-04 13:57:07,646 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: Exception for
> BP-2121456822-10.143.38.149-1396953188241:blk_1074085660_344913
> java.io.IOException: Connection reset by peer
>         at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
>         at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
>         at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
>         at sun.nio.ch.IOUtil.read(IOUtil.java:197)
>         at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
>         at
> org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:57)
>         at
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
>         at
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
>         at
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
>         at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
>         at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
>         at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
>         at java.io.DataInputStream.read(DataInputStream.java:149)
>         at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:192)
>         at
> org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213)
>         at
> org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134)
>         at
> org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109)
>         at
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:442)
>         at
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:701)
>         at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:572)
>         at
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:115)
>         at
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:68)
>         at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221)
>         at java.lang.Thread.run(Thread.java:744)
> 2014-06-04 13:57:07,647 WARN
> org.apache.hadoop.hdfs.server.datanode.DataNode: IOException in
> BlockReceiver.run():
> java.nio.channels.ClosedByInterruptException
>         at
> java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
>         at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:496)
>         at
> org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:63)
>         at
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
>         at
> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:159)
>         at
> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:117)
>         at
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
>         at
> java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
>         at java.io.DataOutputStream.flush(DataOutputStream.java:123)
>         at
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java:1087)
>         at java.lang.Thread.run(Thread.java:744)
> 2014-06-04 13:57:07,648 WARN
> org.apache.hadoop.hdfs.server.datanode.DataNode: checkDiskError: exception:
> java.nio.channels.ClosedByInterruptException
>         at
> java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
>         at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:496)
>         at
> org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:63)
>         at
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
>         at
> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:159)
>         at
> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:117)
>         at
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
>         at
> java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
>         at java.io.DataOutputStream.flush(DataOutputStream.java:123)
>         at
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java:1087)
>         at java.lang.Thread.run(Thread.java:744)
> 2014-06-04 13:57:07,648 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: Not checking disk as
> checkDiskError was called on a network related exception
> 2014-06-04 13:57:07,648 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder:
> BP-2121456822-10.143.38.149-1396953188241:blk_1074085660_344913,
> type=HAS_DOWNSTREAM_IN_PIPELINE
> java.nio.channels.ClosedByInterruptException
>         at
> java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
>         at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:496)
>         at
> org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:63)
>         at
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
>         at
> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:159)
>         at
> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:117)
>         at
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
>         at
> java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
>         at java.io.DataOutputStream.flush(DataOutputStream.java:123)
>         at
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java:1087)
>         at java.lang.Thread.run(Thread.java:744)
> 2014-06-04 13:57:07,648 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder:
> BP-2121456822-10.143.38.149-1396953188241:blk_1074085660_344913,
> type=HAS_DOWNSTREAM_IN_PIPELINE terminating
> 2014-06-04 13:57:07,648 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: opWriteBlock
> BP-2121456822-10.143.38.149-1396953188241:blk_1074085660_344913 received
> exception java.io.IOException: Connection reset by peer
> 2014-06-04 13:57:07,649 ERROR
> org.apache.hadoop.hdfs.server.datanode.DataNode:
> sw-hadoop-007:50010:DataXceiver error processing WRITE_BLOCK operation
>  src: /10.143.38.112:24190 dest: /10.143.38.116:50010
> java.io.IOException: Connection reset by peer
>         at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
>         at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
>         at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
>         at sun.nio.ch.IOUtil.read(IOUtil.java:197)
>         at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
>         at
> org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:57)
>         at
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
>         at
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
>         at
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
>         at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
>         at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
>         at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
>         at java.io.DataInputStream.read(DataInputStream.java:149)
>         at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:192)
>         at
> org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213)
>         at
> org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134)
>         at
> org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109)
>         at
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:442)
>         at
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:701)
>         at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:572)
>         at
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:115)
>         at
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:68)
>         at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221)
>         at java.lang.Thread.run(Thread.java:744)
>
>
> -Ian Brooks

Mime
View raw message