hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brahma Reddy Battula <brahmareddy.batt...@huawei.com>
Subject RE: Not able to place enough replicas
Date Tue, 15 Jul 2014 11:35:55 GMT
HI

 There are four conditions to exclude DN..I feel, you met anyone of the following,Mostly (ii)
or (iii).


i) Check if the node is (being) decommissed.

---> Can check from Namenode UI OR by exceuting hdfs dfsadmin -report


ii) Check the remaining capacity of the target machine

---> Can check from Namenode UI OR by getting data dir usage should > 5 * block size
OR NameNode debug logs


iii) Check the communication traffic of the target machine

---> N/W usage of DN's OR Check DN logs , may be you can see too many files OR xciver count
exceeds exception


iv) Check if the target rack has chosen too many nodes


FYR....Code which will select the target node..


private boolean isGoodTarget(DatanodeDescriptor node,
                               long blockSize, int maxTargetPerLoc,
                               boolean considerLoad,
                               List<DatanodeDescriptor> results) {
    // check if the node is (being) decommissed
    if (node.isDecommissionInProgress() || node.isDecommissioned()) {
      if(LOG.isDebugEnabled()) {
        threadLocalBuilder.get().append(node.toString()).append(": ")
          .append("Node ").append(NodeBase.getPath(node))
          .append(" is not chosen because the node is (being) decommissioned ");
      }
      return false;
    }

    long remaining = node.getRemaining() -
                     (node.getBlocksScheduled() * blockSize);
    // check the remaining capacity of the target machine
    if (blockSize* HdfsConstants.MIN_BLOCKS_FOR_WRITE>remaining) {
      if(LOG.isDebugEnabled()) {
        threadLocalBuilder.get().append(node.toString()).append(": ")
          .append("Node ").append(NodeBase.getPath(node))
          .append(" is not chosen because the node does not have enough space ");
      }
      return false;
    }

    // check the communication traffic of the target machine
    if (considerLoad) {
      double avgLoad = 0;
      int size = clusterMap.getNumOfLeaves();
      if (size != 0 && stats != null) {
        avgLoad = (double)stats.getTotalLoad()/size;
      }
      if (node.getXceiverCount() > (2.0 * avgLoad)) {
        if(LOG.isDebugEnabled()) {
          threadLocalBuilder.get().append(node.toString()).append(": ")
            .append("Node ").append(NodeBase.getPath(node))
            .append(" is not chosen because the node is too busy ");
        }
        return false;
      }
    }

    // check if the target rack has chosen too many nodes
    String rackname = node.getNetworkLocation();
    int counter=1;
    for(Iterator<DatanodeDescriptor> iter = results.iterator();
        iter.hasNext();) {
      Node result = iter.next();
      if (rackname.equals(result.getNetworkLocation())) {
        counter++;
      }
    }
    if (counter>maxTargetPerLoc) {
      if(LOG.isDebugEnabled()) {
        threadLocalBuilder.get().append(node.toString()).append(": ")
          .append("Node ").append(NodeBase.getPath(node))
          .append(" is not chosen because the rack has too many chosen nodes ");
      }
      return false;
    }
    return true;
  }






Thanks & Regards



Brahma Reddy Battula




________________________________
From: Bogdan Raducanu [lrdbgy@gmail.com]
Sent: Tuesday, July 15, 2014 2:45 PM
To: user@hadoop.apache.org
Subject: Re: Not able to place enough replicas

The real cause is the IOException. The PriviledgedActionException is a generic exception.
Other file writes succeed in the same directory with the same user.


On Tue, Jul 15, 2014 at 4:59 AM, Yanbo Liang <yanbohappy@gmail.com<mailto:yanbohappy@gmail.com>>
wrote:
Maybe the user 'test' has no privilege of write operation.
You can refer the ERROR log like:

org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:test (auth:SIMPLE)


2014-07-15 2:07 GMT+08:00 Bogdan Raducanu <lrdbgy@gmail.com<mailto:lrdbgy@gmail.com>>:

I'm getting this error while writing many files.
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Not able to place enough
replicas, still in need of 4 to reach 4

I've set logging to DEBUG but still there is no reason printed. There should've been a reason
after this line but instead there's just an empty line.
Has anyone seen something like this before? It is seen on a 4 node cluster running hadoop
2.2


org.apache.hadoop.hdfs.StateChange: *DIR* NameNode.create: file /file_1002 for DFSClient_NONMAPREDUCE_839626346_1
at 192.168.180.1
org.apache.hadoop.hdfs.StateChange: DIR* NameSystem.startFile: src=/file_1002, holder=DFSClient_NONMAPREDUCE_839626346_1,
clientMachine=192.168.180.1, createParent=true, replication=4, createFlag=[CREATE, OVERWRITE]
org.apache.hadoop.hdfs.StateChange: DIR* addFile: /file_1002 is added
org.apache.hadoop.hdfs.StateChange: DIR* NameSystem.startFile: add /file_1002 to namespace
for DFSClient_NONMAPREDUCE_839
<< ... many other operations ... >>
8 seconds later:
org.apache.hadoop.hdfs.StateChange: *BLOCK* NameNode.addBlock: file /file_1002 fileId=189252
for DFSClient_NONMAPREDUCE_839626346_1
org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.getAdditionalBlock: file /file_1002
for DFSClient_NONMAPREDUCE_839626346_1
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Not able to place enough
replicas, still in need of 4 to reach 4
<< EMPTY LINE >>
org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:test (auth:SIMPLE)
cause:java.io.IOException: File /file_1002 could only be replicated to 0 nodes instead of
minReplication (=1).  There are 4 datanode(s) running and no node(s) are excluded in this
operation.
org.apache.hadoop.ipc.Server: IPC Server handler 9 on 8020, call org.apache.hadoop.hdfs.protocol.ClientProtocol.addBlock
from 192.168.180.1:49592<http://192.168.180.1:49592> Call#1321 Retry#0: error: java.io.IOException:
File /file_1002 could only be replicated to 0 nodes instead of minReplication (=1).  There
are 4 datanode(s) running and no node(s) are excluded in this operation.
java.io.IOException: File /file_1002 could only be replicated to 0 nodes instead of minReplication
(=1).  There are 4 datanode(s) running and no node(s) are excluded in this operation.
        at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1384)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2477)
        at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:555)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:387)
        at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:59582)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042):0




Mime
View raw message