hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Giovanni Marzulli <giovanni.marzu...@ba.infn.it>
Subject Re: Questions about HDFS's placement policy
Date Fri, 16 Mar 2012 13:29:54 GMT
Il 15/03/2012 00:14, Suresh Srinivas ha scritto:
> See my comments inline:
> On Wed, Mar 14, 2012 at 9:24 AM, Giovanni Marzulli 
> <giovanni.marzulli@ba.infn.it <mailto:giovanni.marzulli@ba.infn.it>> 
> wrote:
>     Hello,
>     I'm trying HDFS on a small test cluster and I need to clarify some
>     doubts about hadoop behaviour.
>     Some details of my cluster:
>     Hadoop version: 0.20.2
>     I have two racks (rack1, rack2). Three datanodes for every rack.
>     Replication factor is set to 3.
>     "HDFS’s placement policy is to put one replica on one node in the
>     local rack, another on a node in a different (remote) rack, and
>     the last on a different node in the same remote rack."
>     Instead, I noticed that sometimes, a few blocks of files are
>     stored as follows: two replicas in the local rack and a replica in
>     a different rack. Are there exceptions that cause different
>     behaviour than default placement policy?
> Your description of replica placement is correct. However a node 
> chosen based on this placement may not be a good target, due to the 
> traffic on the node, remaining space etc. See 
> BlockPlacementPolicyDefault#isGoodTarget(). Given the small cluster 
> size, you may be seeing different behavior based on load of individual 
> nodes, racks etc.
>     Likewise, at times some blocks are read from nodes in the remote
>     rack instead of nodes in the local rack. Why does it happen?
> This is surprising. Not sure if the topology is correctly configired.
>     Another thing:if I have two datacenters and two racks for each of
>     them (so a hierarchical network topology), where tworemote
>     replicas arestored? Does Hadoop consider the hierarchy and stores
>     one replica in the local datacenter and two replicas in the other
>     datacenter? Or the two replicas are stored in a totally random rack?
> Hadoop clusters are not spread across datacenters.
When I speak of datacenters, do just an example. I reformulate the question.
If I have this network topology:


and I write a file from a node in the rack2 (rackA). The first replica 
will store on rack2; and where the others two replicas will be stored? 
rackA, rackB or random rack? So, which is the placement policy in a 
hierarchical network topology?
> Regards,
> Suresh

View raw message