hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Kubina <jeff.kub...@gmail.com>
Subject Re: Hadoop Scalability
Date Sat, 19 Jan 2013 21:20:06 GMT
Thiago, when addressing scaling you want to consider whether the algorithm
scales, and if so, if the systems architecture enables the algorithm to
scale, that is, if the algorithm scales on paper, does is also scale on the
hardware?

Algorithms that communicate an amount of data bounded by a constant will
scale on just about any Hadoop cluster, up to a point. At about 4000 nodes
the namenode server may start to become overwhelmed (a bottleneck), and
slow processing down considerably. I think this bottleneck is eliminated in
a not to distance release of the HDFS.

If the amount of data the algorithm communicates is proportional to the
number of processors (map or reduce jobs), than the network bandwidth of
the cluster must increase proportional to the number of processors (since
Hadoop is based on the bulk synchronous
parallel<http://en.wikipedia.org/wiki/Bulk_synchronous_parallel>
model)
to achieve scaling. In such cases a low bandwidth network will impede
scaling. Bryan Duxbury has a nice blog post about networking a Hadoop
cluster here <http://goo.gl/uVeoM>.

More concisely, I would say that "Hadoop scales on clusters with networks
that scale (up to ~4000 nodes)."
-- 
Jeff Kubina

On Thu, Jan 17, 2013 at 10:09 PM, Thiago Vieira <tpbvieira@gmail.com> wrote:

> Hello!
>
> Is common to see this sentence: "Hadoop Scales Linearly". But, is there
> any performance evaluation to confirm this?
>
> In my evaluations, Hadoop processing capacity scales linearly, but not
> proportional to number of nodes, the processing capacity achieved with 20
> nodes is not the double of the processing capacity achieved with 10 nodes.
> Is there any evaluation about this?
>
> Thank you!
>
> --
> Thiago Vieira
>

Mime
View raw message