hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "palmercliff@gmail.com" <palmercl...@gmail.com>
Subject Re: Questions about HDFS's placement policy
Date Fri, 16 Mar 2012 14:23:44 GMT
I recommend that you test your rack identification script, and test it under load.  We encountered
similar, seemingly random placement of files by HDFS and tracked the cause to this script.
I hope this helps.

Sent from the desk of an overwhelmed engineer 

-----Original message-----
From: Giovanni Marzulli <giovanni.marzulli@ba.infn.it>
To: hdfs-user@hadoop.apache.org
Sent: Fri, Mar 16, 2012 09:29:54 EDT
Subject: Re: Questions about HDFS's placement policy

Il 15/03/2012 00:14, Suresh Srinivas ha scritto:
> See my comments inline:
> On Wed, Mar 14, 2012 at 9:24 AM, Giovanni Marzulli 
> <giovanni.marzulli@ba.infn.it <mailto:giovanni.marzulli@ba.infn.it>> 
> wrote:
>     Hello,
>     I'm trying HDFS on a small test cluster and I need to clarify some
>     doubts about hadoop behaviour.
>     Some details of my cluster:
>     Hadoop version: 0.20.2
>     I have two racks (rack1, rack2). Three datanodes for every rack.
>     Replication factor is set to 3.
>     "HDFS’s placement policy is to put one replica on one node in the
>     local rack, another on a node in a different (remote) rack, and
>     the last on a different node in the same remote rack."
>     Instead, I noticed that sometimes, a few blocks of files are
>     stored as follows: two replicas in the local rack and a replica in
>     a different rack. Are there exceptions that cause different
>     behaviour than default placement policy?
> Your description of replica placement is correct. However a node 
> chosen based on this placement may not be a good target, due to the 
> traffic on the node, remaining space etc. See 
> BlockPlacementPolicyDefault#isGoodTarget(). Given the small cluster 
> size, you may be seeing different behavior based on load of individual 
> nodes, racks etc.
>     Likewise, at times some blocks are read from nodes in the remote
>     rack instead of nodes in the local rack. Why does it happen?
> This is surprising. Not s
View raw message