hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yuzhang Han <yuzhanghan1...@gmail.com>
Subject Re: "could only be replicated to 0 nodes instead of minReplication" exception during job execution
Date Tue, 25 Jun 2013 04:04:08 GMT
Thank you, Omkar.

I didn't see other errors on datanode and namenode logs. My namenode 50070
interface shows

Configured Capacity : 393.72 GB DFS Used : 60.86 GB Non DFS Used :
137.51 GB DFS
Remaining : 195.35 GB DFS Used% : 15.46% DFS Remaining% : 49.62% Block Pool
Used : 60.86 GB Block Pool Used% : 15.46% DataNodes usages : Min %
Median %Max %stdev %

14.55% 16.37% 16.37% 0.91%


It doesn't imply insufficient disk space, does it? Can you think of any
other possible cause of the exceptions?


On Mon, Jun 24, 2013 at 6:17 PM, Omkar Joshi <ojoshi@hortonworks.com> wrote:

> Hi,
>
> I see there are 2 datanodes and for some reason namenode is not able to
> create even single replica for requested blocks. are you sure the system on
> which these datanodes are running have sufficient disk space? Do you see
> any other errors in datanode/namenode logs?
>
> What must be happening is as file creation in hdfs is failing it is
> marking that reduce attempt as failure and restarting it. Keep checking
> namenode state when it reaches 67%.
>
> Thanks,
> Omkar Joshi
> *Hortonworks Inc.* <http://www.hortonworks.com>
>
>
> On Mon, Jun 24, 2013 at 3:01 PM, Yuzhang Han <yuzhanghan1982@gmail.com>wrote:
>
>> Hello,
>>
>> I am using YARN. I get some exceptions at my namenode and datanode. They
>> are thrown when my Reduce progress gets 67%. Then, reduce phase is
>> restarted from 0% several times, but always restarts at this point. Can
>> someone tell me what I should do? Many thanks!
>>
>>
>> Namenode log:
>>
>> 2013-06-24 19:08:50,345 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated:
10.224.2.190:50010 is added to blk_654446797771285606_5062{blockUCState=UNDER_CONSTRUCTION,
primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[10.224.2.190:50010|RBW]]} size 0
>> 2013-06-24 19:08:50,349 WARN org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy:
Not able to place enough replicas, still in need of 1 to reach 1
>> For more information, please enable DEBUG log level on org.apache.commons.logging.impl.Log4JLogger
>> 2013-06-24 19:08:50,350 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
as:ubuntu (auth:SIMPLE) cause:java.io.IOException: File /output/_temporary/1/_temporary/attempt_1372090853102_0001_r_000002_0/part-00002
could only be replicated to 0 nodes instead of minReplication (=1).  There are 2 datanode(s)
running and no node(s) are excluded in this operation.
>> 2013-06-24 19:08:50,353 INFO org.apache.hadoop.ipc.Server: IPC Server handler 1 on
9000, call org.apache.hadoop.hdfs.protocol.ClientProtocol.addBlock from 10.224.2.190:49375:
error: java.io.IOException: File /output/_temporary/1/_temporary/attempt_1372090853102_0001_r_000002_0/part-00002
could only be replicated to 0 nodes instead of minReplication (=1).  There are 2 datanode(s)
running and no node(s) are excluded in this operation.
>> java.io.IOException: File /output/_temporary/1/_temporary/attempt_1372090853102_0001_r_000002_0/part-00002
could only be replicated to 0 nodes instead of minReplication (=1).  There are 2 datanode(s)
running and no node(s) are excluded in this operation.
>> 	at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1339)
>> 	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2155)
>> 	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:491)
>> 	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:351)
>> 	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:40744)
>> 	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:454)
>> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1014)
>> 	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1741)
>> 	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1737)
>> 	at java.security.AccessController.doPrivileged(Native Method)
>> 	at javax.security.auth.Subject.doAs(Subject.java:416)
>> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1478)
>> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1735)
>> 2013-06-24 19:08:50,413 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated:
10.224.2.190:50010 is added to blk_8924314838535676494_5063{blockUCState=UNDER_CONSTRUCTION,
primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[10.224.2.190:50010|RBW]]} size 0
>> 2013-06-24 19:08:50,418 WARN org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy:
Not able to place enough replicas, still in need of 1 to reach 1
>> For more information, please enable DEBUG log level on org.apache.commons.logging.impl.Log4JLogger
>>
>>
>>
>> Datanode log:
>>
>> 2013-06-24 19:25:54,695 INFO
>> org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder:
>> BP-1724882733-10.10.79.145-1372090400593:blk_-2417373821601940925_6022,
>> type=LAST_IN_PIPELINE, downstreams=0:[] terminating
>> 2013-06-24 19:25:54,699 INFO
>> org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving
>> BP-1724882733-10.10.79.145-1372090400593:blk_3177955398059619584_6033 src: /
>> 10.35.99.108:59710 dest: /10.35.99.108:50010
>> 2013-06-24 19:25:56,473 INFO
>> org.apache.hadoop.hdfs.server.datanode.DataNode: Exception for
>> BP-1724882733-10.10.79.145-1372090400593:blk_8751401862589207807_6026
>> java.io.IOException: Connection reset by peer
>>     at sun.nio.ch.FileDispatcher.read0(Native Method)
>>     at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
>>     at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:251)
>>     at sun.nio.ch.IOUtil.read(IOUtil.java:224)
>>     at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:254)
>>     at
>> org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:55)
>>     at
>> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
>>     at
>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:159)
>>     at
>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:129)
>>     at java.io.FilterInputStream.read(FilterInputStream.java:133)
>>     at java.io.BufferedInputStream.read1(BufferedInputStream.java:273)
>>     at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
>>     at java.io.DataInputStream.read(DataInputStream.java:149)
>>     at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:192)
>>     at
>> org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213)
>>     at
>> org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:171)
>>     at
>> org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109)
>>     at
>> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:414)
>>     at
>> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:644)
>>     at
>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:506)
>>     at
>> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:98)
>>     at
>> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:65)
>>     at
>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:219)
>>     at java.lang.Thread.run(Thread.java:679)
>> 2013-06-24 19:25:56,476 INFO
>> org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder:
>> BP-1724882733-10.10.79.145-1372090400593:blk_8751401862589207807_6026,
>> type=LAST_IN_PIPELINE, downstreams=0:[]: Thread is interrupted.
>>
>
>

Mime
View raw message