hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matieu Bachant-Lagace <matieu.bachant-lag...@ubisoft.com>
Subject RE: Performance Benchmarks on "Number of Machines"
Date Fri, 27 May 2016 18:50:03 GMT
You could see it the other way around, it is enabling everyone to solve problems that are too
complex for one server.

Another way to look at it is that it reduces costs because scaling out is much cheaper than
scaling up.

You can actually (and usually you have to) be pretty ingenious and you have to be a good software
developer if you want to do something in Hadoop. If you are doing a poor programming job you
will not get the expected benefits.

My 2c.

Matieu

De : Deepak Goel [mailto:deicool@gmail.com]
Envoyé : 27 mai 2016 14:45
À : Arun Natva <arun.natva@gmail.com>
Cc : user <user@hadoop.apache.org>
Objet : Re: Performance Benchmarks on "Number of Machines"

What I think, and i am sorry if i am wrong :-(

In the cluster you are not only adding hardware (cpu, memory, disk) but you are having separate
software (os, jvm, application)...So the reason the cluster is scaling linear is not due to
hardware, but due to seperate software on each machine [As compared to a single machine where
you scale up (keep on adding cpu, memory, disk to the same machine) but the software remains
the same (OS, JVM, application)].. So scale out (cluster) has an advantage over scale up (single
machine with more hardware), that the software set is different for each machine

So Hadoop is making us bad software programmers overall by providing us the facility to replicate
the software across multiple machines and of course providing reliability :)

Hey

Namaskara~Nalama~Guten Tag~Bonjour


   --
Keigu

Deepak
73500 12833
www.simtree.net<http://www.simtree.net>, deepak@simtree.net<mailto:deepak@simtree.net>
deicool@gmail.com<mailto:deicool@gmail.com>

LinkedIn: www.linkedin.com/in/deicool<http://www.linkedin.com/in/deicool>
Skype: thumsupdeicool
Google talk: deicool
Blog: http://loveandfearless.wordpress.com
Facebook: http://www.facebook.com/deicool

"Contribute to the world, environment and more : http://www.gridrepublic.org
"

On Fri, May 27, 2016 at 11:40 PM, Arun Natva <arun.natva@gmail.com<mailto:arun.natva@gmail.com>>
wrote:
Deepak,
I believe yahoo and Facebook have largest clusters like over 4-5 thousand nodes of size..
If you add a new server to the cluster, you are simply adding to the cpu, memory, disk space
of the cluster.. So, the capacity grows linearly as you add nodes except that network bandwidth
is shared

I didn't understand your last question on scaling...


Sent from my iPhone

On May 27, 2016, at 11:51 AM, Deepak Goel <deicool@gmail.com<mailto:deicool@gmail.com>>
wrote:

Hey

Namaskara~Nalama~Guten Tag~Bonjour

Are there any performance benchmarks as to how many machines can Hadoop scale up to? Is the
growth linear (For 1 machine - growth x, for 2 machines - 2x growth, for 10000 machines -
10000x growth??)

Also does the scaling depend on the type of jobs and amount of data? Or is it independent?

Thank You
Deepak
   --
Keigu

Deepak
73500 12833
www.simtree.net<http://www.simtree.net>, deepak@simtree.net<mailto:deepak@simtree.net>
deicool@gmail.com<mailto:deicool@gmail.com>

LinkedIn: www.linkedin.com/in/deicool<http://www.linkedin.com/in/deicool>
Skype: thumsupdeicool
Google talk: deicool
Blog: http://loveandfearless.wordpress.com
Facebook: http://www.facebook.com/deicool

"Contribute to the world, environment and more : http://www.gridrepublic.org
"

Mime
View raw message