hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ben Clay" <rbc...@ncsu.edu>
Subject RE: HDFS load balancing for non-local reads
Date Fri, 06 Jan 2012 19:56:12 GMT

Understood. We do not have a situation that extreme, I was just looking for
conceptual verification that reads are balanced across replicas of equal
distance.  From the PDF you linked:

"For reading, the name node first checks if the client's computer is located
in the cluster. If yes, block locations are returned to the client in the
order of its closeness to the reader. The block is read from data nodes in
this preference order."

If two datanodes have equal closeness, I'd like to know how the NameNode
chooses between them.


-----Original Message-----
From: alo.alt [mailto:wget.null@googlemail.com] 
Sent: Friday, January 06, 2012 12:45 PM
To: hdfs-user@hadoop.apache.org
Subject: Re: HDFS load balancing for non-local reads


the scenario should not happen, if one DN has 20 clients and the other zero
(same block) the cluster (or DN) has another problem. Rack Awareness is
described here:

- Alex

Alexander Lorenz

On Jan 5, 2012, at 6:49 PM, Ben Clay wrote:

> Suresh-
> Thanks for the tips, I'll check those functions out, and examine plugging
in a different NetworkTopology.
> So to clarify, under the current scheme, if we have 1 block on two local
rack nodes A and B, it randomly chooses between those? IE, if DataNode A is
serving 20 clients and DataNode B is serving 1 client, they both have a 50%
chance of being selected for the 21st client?
> -Ben
> From: Suresh Srinivas [mailto:suresh@hortonworks.com] 
> Sent: Thursday, January 05, 2012 5:33 PM
> To: hdfs-user@hadoop.apache.org
> Subject: Re: HDFS load balancing for non-local reads
> Currently it sorts the block locations as:
> # local node
> # local rack node
> # random order of remote nodes
> See DatanodeManager#sortLocatedBlock(...) and
> You can play around with other policies by plugging in different
> On Thu, Jan 5, 2012 at 1:40 PM, Ben Clay <rbclay@ncsu.edu> wrote:
> Hi-
> How does the NameNode handle load balancing of non-local reads with
multiple block locations when locality is equal?
> IE, if the client is equidistant (same rack) from 2 DataNodes hosting the
same block, does the NameNode consider current client count or any other
load indicators when deciding which DataNode will satisfy the read request?
Or, is the client provided a list of all split locations and is allowed to
make this choice themselves?
> Thanks!
> -Ben

View raw message