hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pablo Musa <pa...@psafe.com>
Subject RE: Datanode error
Date Mon, 23 Jul 2012 14:28:01 GMT
I am sorry, but I received an error when I sent the message to the list and all responses were
sent to my junk mail. So I tried to send it again, and just then noticed your emails.

>Please do also share if you're seeing an issue that you think is 
>related to these log messages.

My datanodes do not have any big problem, but my regionservers are getting shutdown by
timeout and I think it is related to the datanodes. I already tried a lot of different configurations
but they keep "crashing". I asked in the hbase list, but we could not find anything (RSs seem
healthy). We have 10 RSs and they get shutdown 7 times per day.

So I thought maybe you guys could find what is wrong with my system.

Thanks again,
Pablo
 
-----Original Message-----
From: Raj Vishwanathan [mailto:rajvish@yahoo.com] 
Sent: sexta-feira, 20 de julho de 2012 14:38
To: common-user@hadoop.apache.org
Subject: Re: Datanode error

Could also be due to network issues. Number of sockets could be less or number of threads
could be less.

Raj



>________________________________
> From: Harsh J <harsh@cloudera.com>
>To: common-user@hadoop.apache.org
>Sent: Friday, July 20, 2012 9:06 AM
>Subject: Re: Datanode error
> 
>Pablo,
>
>These all seem to be timeouts from clients when they wish to read a 
>block and drops from clients when they try to write a block. I wouldn't 
>think of them as critical errors. Aside of being worried that a DN is 
>logging these, are you noticing any usability issue in your cluster? If 
>not, I'd simply blame this on stuff like speculative tasks, region 
>servers, general HDFS client misbehavior, etc.
>
>Please do also share if you're seeing an issue that you think is 
>related to these log messages.
>
>On Fri, Jul 20, 2012 at 6:37 PM, Pablo Musa <pablo@psafe.com> wrote:
>> Hey guys,
>> I have a cluster with 11 nodes (1 NN and 10 DNs) which is running and working.
>> However my datanodes keep having the same errors, over and over.
>>
>> I googled the problems and tried different flags (ex: 
>> -XX:MaxDirectMemorySize=2G) and different configs (xceivers=8192) but could not solve
it.
>>
>> Does anyone know what is the problem and how can I solve it? (the 
>> stacktrace is at the end)
>>
>> I am running:
>> Java 1.7
>> Hadoop 0.20.2
>> Hbase 0.90.6
>> Zoo 3.3.5
>>
>> % top -> shows low load average (6% most of the time up to 60%), 
>> already considering the number of cpus % vmstat -> shows no swap at 
>> all % sar -> shows 75% idle cpu in the worst case
>>
>> Hope you guys can help me.
>> Thanks in advance,
>> Pablo
>>
>> 2012-07-20 00:03:44,455 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace:
src: /DN01:50010, dest:
>> /DN01:43516, bytes: 396288, op: HDFS_READ, cliID: DFSClient_hb_rs_DN01,60020,1342734302945_1342734303427,
offset: 54956544, srvID: DS-798921853-DN01-50010-1328651609047, blockid: blk_914960691839012728_14061688,
duration:
>> 480061254006
>> 2012-07-20 00:03:44,455 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(DN01:50010,
storageID=DS-798921853-DN01-50010-1328651609047, infoPort=50075, ipcPort=50020):Got exception
while serving blk_914960691839012728_14061688 to /DN01:
>> java.net.SocketTimeoutException: 480000 millis timeout while waiting 
>>for channel to be ready for write. ch : 
>>java.nio.channels.SocketChannel[connected local=/DN01:50010 
>>remote=/DN01:43516]
>>         at 
>>org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeou
>>t.java:246)
>>         at 
>>org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputS
>>tream.java:159)
>>         at 
>>org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputS
>>tream.java:198)
>>         at 
>>org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSen
>>der.java:397)
>>         at 
>>org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSend
>>er.java:493)
>>         at 
>>org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiv
>>er.java:279)
>>         at 
>>org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.jav
>>a:175)
>>
>> 2012-07-20 00:03:44,455 ERROR 
>>org.apache.hadoop.hdfs.server.datanode.DataNode: 
>>DatanodeRegistration(DN01:50010, 
>>storageID=DS-798921853-DN01-50010-1328651609047, infoPort=50075, 
>>ipcPort=50020):DataXceiver
>> java.net.SocketTimeoutException: 480000 millis timeout while waiting 
>>for channel to be ready for write. ch : 
>>java.nio.channels.SocketChannel[connected local=/DN01:50010 
>>remote=/DN01:43516]
>>         at 
>>org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeou
>>t.java:246)
>>         at 
>>org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputS
>>tream.java:159)
>>         at 
>>org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputS
>>tream.java:198)
>>         at 
>>org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSen
>>der.java:397)
>>         at 
>>org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSend
>>er.java:493)
>>         at 
>>org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiv
>>er.java:279)
>>         at 
>>org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.jav
>>a:175)
>>
>> 2012-07-20 00:12:11,949 INFO 
>>org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification 
>>succeeded for blk_4602445008578088178_5707787
>> 2012-07-20 00:12:11,962 INFO 
>>org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock 
>>blk_-8916344806514717841_14081066 received exception 
>>java.net.SocketTimeoutException: 63000 millis timeout while waiting 
>>for channel to be ready for read. ch : 
>>java.nio.channels.SocketChannel[connected local=/DN01:36634 
>>remote=/DN03:50010]
>> 2012-07-20 00:12:11,962 ERROR 
>>org.apache.hadoop.hdfs.server.datanode.DataNode: 
>>DatanodeRegistration(DN01:50010, 
>>storageID=DS-798921853-DN01-50010-1328651609047, infoPort=50075, 
>>ipcPort=50020):DataXceiver
>> java.net.SocketTimeoutException: 63000 millis timeout while waiting 
>>for channel to be ready for read. ch : 
>>java.nio.channels.SocketChannel[connected local=/DN01:36634 
>>remote=/DN03:50010]
>>         at 
>>org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.jav
>>a:164)
>>         at 
>>org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:15
>>5)
>>         at 
>>org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:12
>>8)
>>         at 
>>org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:11
>>6)
>>         at java.io.FilterInputStream.read(FilterInputStream.java:83)
>>         at 
>>java.io.DataInputStream.readShort(DataInputStream.java:312)
>>         at 
>>org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXcei
>>ver.java:447)
>>         at 
>>org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.jav
>>a:183)
>>
>>
>> 2012-07-20 00:12:20,670 INFO 
>>org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification 
>>succeeded for blk_7238561256016868237_3555939
>> 2012-07-20 00:12:22,541 INFO 
>>org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block 
>>blk_-7028120671250332363_14081073 src: /DN03:50331 dest: /DN01:50010
>> 2012-07-20 00:12:22,544 INFO 
>>org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in 
>>receiveBlock for block blk_-7028120671250332363_14081073 
>>java.io.EOFException: while trying to read 65557 bytes
>> 2012-07-20 00:12:22,544 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder
0 for block blk_-7028120671250332363_14081073 Interrupted.
>> 2012-07-20 00:12:22,544 INFO 
>>org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 0 for 
>>block blk_-7028120671250332363_14081073 terminating
>> 2012-07-20 00:12:22,544 INFO 
>>org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock 
>>blk_-7028120671250332363_14081073 received exception 
>>java.io.EOFException: while trying to read 65557 bytes
>> 2012-07-20 00:12:22,544 ERROR 
>>org.apache.hadoop.hdfs.server.datanode.DataNode: 
>>DatanodeRegistration(DN01:50010, 
>>storageID=DS-798921853-DN01-50010-1328651609047, infoPort=50075, 
>>ipcPort=50020):DataXceiver
>> java.io.EOFException: while trying to read 65557 bytes
>>         at 
>>org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockRe
>>ceiver.java:290)
>>         at 
>>org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(Bl
>>ockReceiver.java:334)
>>         at 
>>org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(Blo
>>ckReceiver.java:398)
>>         at 
>>org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(Bloc
>>kReceiver.java:577)
>>         at 
>>org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXcei
>>ver.java:494)
>>         at 
>>org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.jav
>>a:183)
>>
>>
>> 2012-07-20 00:12:34,266 INFO 
>>org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block 
>>blk_-1834839455324747507_14081046 src: /DN05:59897 dest: /DN01:50010
>> 2012-07-20 00:12:34,267 INFO 
>>org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in 
>>receiveBlock for block blk_-1834839455324747507_14081046 
>>java.io.EOFException: while trying to read 65557 bytes
>> 2012-07-20 00:12:34,268 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder
0 for block blk_-1834839455324747507_14081046 Interrupted.
>> 2012-07-20 00:12:34,268 INFO 
>>org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 0 for 
>>block blk_-1834839455324747507_14081046 terminating
>> 2012-07-20 00:12:34,268 INFO 
>>org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock 
>>blk_-1834839455324747507_14081046 received exception 
>>java.io.EOFException: while trying to read 65557 bytes
>> 2012-07-20 00:12:34,268 ERROR 
>>org.apache.hadoop.hdfs.server.datanode.DataNode: 
>>DatanodeRegistration(DN01:50010, 
>>storageID=DS-798921853-DN01-50010-1328651609047, infoPort=50075, 
>>ipcPort=50020):DataXceiver
>> java.io.EOFException: while trying to read 65557 bytes
>>         at 
>>org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockRe
>>ceiver.java:290)
>>         at 
>>org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(Bl
>>ockReceiver.java:334)
>>         at 
>>org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(Blo
>>ckReceiver.java:398)
>>         at 
>>org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(Bloc
>>kReceiver.java:577)
>>         at 
>>org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXcei
>>ver.java:494)
>>         at 
>>org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.jav
>>a:183)
>> 2012-07-20 00:12:34,269 INFO 
>>org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block 
>>blk_3941134611454287401_14080990 src: /DN03:50345 dest: /DN01:50010
>> 2012-07-20 00:12:34,270 INFO 
>>org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in 
>>receiveBlock for block blk_3941134611454287401_14080990 
>>java.io.EOFException: while trying to read 65557 bytes
>> 2012-07-20 00:12:34,270 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder
0 for block blk_3941134611454287401_14080990 Interrupted.
>> 2012-07-20 00:12:34,271 INFO 
>>org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 0 for 
>>block blk_3941134611454287401_14080990 terminating
>> 2012-07-20 00:12:34,271 INFO 
>>org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock 
>>blk_3941134611454287401_14080990 received exception 
>>java.io.EOFException: while trying to read 65557 bytes
>> 2012-07-20 00:12:34,271 ERROR 
>>org.apache.hadoop.hdfs.server.datanode.DataNode: 
>>DatanodeRegistration(DN01:50010, 
>>storageID=DS-798921853-DN01-50010-1328651609047, infoPort=50075, 
>>ipcPort=50020):DataXceiver
>> java.io.EOFException: while trying to read 65557 bytes
>>         at 
>>org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockRe
>>ceiver.java:290)
>>         at 
>>org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(Bl
>>ockReceiver.java:334)
>>         at 
>>org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(Blo
>>ckReceiver.java:398)
>>         at 
>>org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(Bloc
>>kReceiver.java:577)
>>         at 
>>org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXcei
>>ver.java:494)
>>         at 
>>org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.jav
>>a:183)
>
>
>
>--
>Harsh J
>
>
>

Mime
View raw message