hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vinayakumar B <vinayakum...@apache.org>
Subject Re: Why do non data nodes need rack awareness?
Date Fri, 03 Jun 2016 01:14:51 GMT
Rack awareness feature introduced to place the data blocks distributed
among multiple racks, to avoid the data loss in case of whole rack failure.

Now while reading/writing data blocks, to find the closest, data locality
w.r.t to client will be considered. To know the nearest datanode in terms
of rack mapping for the client, client's rack details arts required.  So
that's why if there are no datanodes also client's rack mapping will be
resolved by  namenode. By giving the correct real details, local rack
datanode will be chosen for read improving the performance.
In case default rack is given for non-datanode ip, any random datanode will
be chosen to read the data.

Hope this helps,

Cheers,
-Vinay
On 3 Jun 2016 03:37, "Colin Kincaid Williams" <discord@uw.edu> wrote:

Recently we had a namenode that had a failed edits directory, and
there was a failover. Things appeared to be functioning properly at
first, but later we had hdfs issues.

Looking at the namenode logs, we saw

2016-06-01 20:38:18,771 ERROR
org.apache.hadoop.net.ScriptBasedMapping: Script
/etc/hadoop/conf/getRackID.sh returned 0 values when 1 were expected.
2016-06-01 20:38:18,771 WARN org.apache.hadoop.ipc.Server: IPC Server
handler 0 on 8020, call
org.apache.hadoop.hdfs.protocol.ClientProtocol.getBlockLocations from
10.51.28.100:42826 Call#484441029 Retry#0
java.lang.NullPointerException
  at
org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.sortLocatedBlocks(DatanodeManager.java:359)
  at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1774)
  at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:527)
  at
org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getBlockLocations(AuthorizationProviderProxyClientProtocol.java:85)
  at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:356)
  at
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
  at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587)
  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
  at java.security.AccessController.doPrivileged(Native Method)
  at javax.security.auth.Subject.doAs(Subject.java:415)
  at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)

So we could see that our rack awareness script was not returning a
value. Then we made changes to the script to return the callers
arguments for the script. We found a list of IPs, some which run
services like oozie, some IPs our gateway server. However none of
these IPs are the datanodes themselves.

The symptoms of this issue were that the namenode itself couldn't cat
files on the system, or make requests to move files on the history
server, etc.

>From my understanding about rack awareness, we just need to provide a
rack id for hosts that are datanodes. However all are datanodes were
listed, and the requested ips were from non-datanodes.

The solution was to provide a default ip for missing IPs in the rack
awareness script. This is not well understood from the rack awareness
docs, and caused a DOS on our hadoop services.

But I want to know why  the rack awareness script is getting called
with IPs of non datanodes from our hadoop namenode. Is this a design
feature of the yarn libraries? Why do non data node IPs need a rack
id?

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@hadoop.apache.org
For additional commands, e-mail: user-help@hadoop.apache.org

Mime
View raw message