hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bharath Ravi <bharathra...@gmail.com>
Subject Re: Load balancing requests in HDFS
Date Wed, 19 Oct 2011 23:37:23 GMT
That clarified it, thanks a lot!

I'm trying to write a load balancer that prioritises datanodes in terms of
load (in the last t minutes, say)
as well as proximity to the client, instead of just the proximity. That is,
if the closest datanode to the client for a block is facing
a load above some threshold, it picks the next closest, that is not too
heavily loaded.

Sounds like this could work well in conjunction with
ReplicationTargetChooser, since it handles write requests, which is
another side of the same coin.

Thanks, again!

On 19 October 2011 02:04, Uma Maheswara Rao G 72686 <maheswara@huawei.com>wrote:

> ----- Original Message -----
> From: Bharath Ravi <bharathravi1@gmail.com>
> Date: Wednesday, October 19, 2011 8:16 am
> Subject: Re: Load balancing requests in HDFS
> To: common-dev@hadoop.apache.org
>
> > Thanks a lot Steve!
> >
> > ReplicationTargetChooser seems to address load balancing for initially
> > placing/laying out data,
> > but it doesn't seem to do active load balancing for incoming
> > requests to a
> > datanode: or does it?
>
> For every request, ReplicationTargetChooser will check the good targets to
> write.
> ( space, traffic, threadcount on DN..etc). DNs will update their statistics
> by heartbeats. So, NN can check this before actually choosing the taget to
> write the Data.
> Hope, this clarifies your doubt.
>
> >
> > Also, would you know if there are statistics on how effective
> > over-replication is for throughput gain?
> > Basically, although one might add more replicas, are they actually
> > usedeffectively to serve incoming requests?
> here over-replication means uping the replication factor, is it?
>
> >
> > On 18 October 2011 12:37, Steve Loughran <stevel@apache.org> wrote:
> >
> > > On 16/10/11 02:53, Bharath Ravi wrote:
> > >
> > >> Hi all,
> > >>
> > >> I have a question about how HDFS load balances requests for
> > files/blocks:>>
> > >> HDFS currently distributes data blocks randomly, for balance.
> > >> However, if certain files/blocks are more popular than others,
> > some nodes
> > >> might get an "unfair" number of requests.
> > >> Adding more replicas for these popular files might not help,
> > unless HDFS
> > >> explicitly distributes requests fairly among the replicas.
> > >>
> > >
> > > Have a look at the ReplicationTargetChooser class; it does take
> > datanode> load into account, though it's concern is distribution
> > for data
> > > availability, not performance.
> > >
> > > The standard technique for popular files -including MR job JAR
> > files- is to
> > > over-replicate. One problem: how to determine what is popular
> > without adding
> > > more load on the namenode
> > >
> >
> >
> >
> > --
> > Bharath Ravi
> >
>
> Regards,
> Uma
>



-- 
Bharath Ravi

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message