hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bharath Ravi <bharathra...@gmail.com>
Subject Re: Load balancing requests in HDFS
Date Wed, 19 Oct 2011 02:43:34 GMT
Thanks a lot Steve!

ReplicationTargetChooser seems to address load balancing for initially
placing/laying out data,
but it doesn't seem to do active load balancing for incoming requests to a
datanode: or does it?

Also, would you know if there are statistics on how effective
over-replication is for throughput gain?
Basically, although one might add more replicas, are they actually used
effectively to serve incoming requests?

On 18 October 2011 12:37, Steve Loughran <stevel@apache.org> wrote:

> On 16/10/11 02:53, Bharath Ravi wrote:
>> Hi all,
>> I have a question about how HDFS load balances requests for files/blocks:
>> HDFS currently distributes data blocks randomly, for balance.
>> However, if certain files/blocks are more popular than others, some nodes
>> might get an "unfair" number of requests.
>> Adding more replicas for these popular files might not help, unless HDFS
>> explicitly distributes requests fairly among the replicas.
> Have a look at the ReplicationTargetChooser class; it does take datanode
> load into account, though it's concern is distribution for data
> availability, not performance.
> The standard technique for popular files -including MR job JAR files- is to
> over-replicate. One problem: how to determine what is popular without adding
> more load on the namenode

Bharath Ravi

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message