hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Deepak Goel <deic...@gmail.com>
Subject Re: Performance Benchmarks on "Number of Machines"
Date Fri, 27 May 2016 18:56:30 GMT
My thoughts inline....



You could see it the other way around, it is enabling everyone to solve
problems that are too complex for one server.


*****

Deepak

*********


If one server (with scale up) would provide the scalability as several
hundreds of them (scale out) then the effect computing of both would be the
same and they would be able to solve the same complex problems..but it is
problem in our software (OS, JVM, application) that prevents a single
server to scale linearly and hence not solve complex problems



*****

Deepak

*********



Another way to look at it is that it reduces costs because scaling out is
much cheaper than scaling up.


*****

Deepak

*********


Would you have cost analysis that scale out is cheaper than scale up
(considering everything like space, maintenance, complexity of managing
multiple nodes, network cost between scale out nodes, substandandard nodes
crashing in scale out, etc)


*****

Deepak

*********



You can actually (and usually you have to) be pretty ingenious and you have
to be a good software developer if you want to do something in Hadoop. If
you are doing a poor programming job you will not get the expected benefits.




Hey

Namaskara~Nalama~Guten Tag~Bonjour


   --
Keigu

Deepak
73500 12833
www.simtree.net, deepak@simtree.net
deicool@gmail.com

LinkedIn: www.linkedin.com/in/deicool
Skype: thumsupdeicool
Google talk: deicool
Blog: http://loveandfearless.wordpress.com
Facebook: http://www.facebook.com/deicool

"Contribute to the world, environment and more : http://www.gridrepublic.org
"

On Sat, May 28, 2016 at 12:20 AM, Matieu Bachant-Lagace <
matieu.bachant-lagace@ubisoft.com> wrote:

> You could see it the other way around, it is enabling everyone to solve
> problems that are too complex for one server.
>
>
>
> Another way to look at it is that it reduces costs because scaling out is
> much cheaper than scaling up.
>
>
>
> You can actually (and usually you have to) be pretty ingenious and you
> have to be a good software developer if you want to do something in Hadoop.
> If you are doing a poor programming job you will not get the expected
> benefits.
>
>
>
> My 2c.
>
>
>
> Matieu
>
>
>
> *De :* Deepak Goel [mailto:deicool@gmail.com]
> *Envoyé :* 27 mai 2016 14:45
> *À :* Arun Natva <arun.natva@gmail.com>
> *Cc :* user <user@hadoop.apache.org>
> *Objet :* Re: Performance Benchmarks on "Number of Machines"
>
>
>
> What I think, and i am sorry if i am wrong :-(
>
> In the cluster you are not only adding hardware (cpu, memory, disk) but
> you are having separate software (os, jvm, application)...So the reason the
> cluster is scaling linear is not due to hardware, but due to seperate
> software on each machine [As compared to a single machine where you scale
> up (keep on adding cpu, memory, disk to the same machine) but the software
> remains the same (OS, JVM, application)].. So scale out (cluster) has an
> advantage over scale up (single machine with more hardware), that the
> software set is different for each machine
>
> So Hadoop is making us bad software programmers overall by providing us
> the facility to replicate the software across multiple machines and of
> course providing reliability :)
>
>
> Hey
>
> Namaskara~Nalama~Guten Tag~Bonjour
>
>
>    --
> Keigu
>
> Deepak
> 73500 12833
> www.simtree.net, deepak@simtree.net
> deicool@gmail.com
>
> LinkedIn: www.linkedin.com/in/deicool
> Skype: thumsupdeicool
> Google talk: deicool
> Blog: http://loveandfearless.wordpress.com
> Facebook: http://www.facebook.com/deicool
>
> "Contribute to the world, environment and more :
> http://www.gridrepublic.org
> "
>
>
>
> On Fri, May 27, 2016 at 11:40 PM, Arun Natva <arun.natva@gmail.com> wrote:
>
> Deepak,
>
> I believe yahoo and Facebook have largest clusters like over 4-5 thousand
> nodes of size..
>
> If you add a new server to the cluster, you are simply adding to the cpu,
> memory, disk space of the cluster.. So, the capacity grows linearly as you
> add nodes except that network bandwidth is shared
>
>
>
> I didn't understand your last question on scaling...
>
>
>
>
> Sent from my iPhone
>
>
> On May 27, 2016, at 11:51 AM, Deepak Goel <deicool@gmail.com> wrote:
>
>
> Hey
>
> Namaskara~Nalama~Guten Tag~Bonjour
>
> Are there any performance benchmarks as to how many machines can Hadoop
> scale up to? Is the growth linear (For 1 machine - growth x, for 2 machines
> - 2x growth, for 10000 machines - 10000x growth??)
>
> Also does the scaling depend on the type of jobs and amount of data? Or is
> it independent?
>
> Thank You
> Deepak
>    --
> Keigu
>
> Deepak
> 73500 12833
> www.simtree.net, deepak@simtree.net
> deicool@gmail.com
>
> LinkedIn: www.linkedin.com/in/deicool
> Skype: thumsupdeicool
> Google talk: deicool
> Blog: http://loveandfearless.wordpress.com
> Facebook: http://www.facebook.com/deicool
>
> "Contribute to the world, environment and more :
> http://www.gridrepublic.org
> "
>
>
>

Mime
View raw message