hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dejan Menges <dejan.men...@gmail.com>
Subject Re: Reliability of Hadoop
Date Fri, 27 May 2016 19:47:15 GMT
Hi Deepak,

Hadoop is just platform (Hadoop and all around it). Toolset to do what you
want to do.

If you are writing bad code you can't blame programming language. It's you
not being able to write good code. There's also nothing bad in using
commodity hardware (and not sure I understand whats' commodity software).
In this very moment, while we are exchanging this - how much do we know or
care on which hardware mail servers are running? We don't, neither we care.

For whitepapers and use cases internet is full of them.

My company is keeping majority of the really important data in Hadoop
ecosystem. Some of the best software developers I met so far are writing
different types of code from it, from analytics to development of in house
software and plugins for different things.

However, I'm not sure that anyone on any mailing list can give you answers
than you need. I would start with official documentation and understanding
how specific component works in depth and why it works the way it works.

My 2c

Cheers,
Dejan

On Fri, May 27, 2016 at 9:41 PM Deepak Goel <deicool@gmail.com> wrote:

> Sorry once again if I am wrong, or my comments are without significance
>
> I am not saying Hadoop is bad or good...It is just that Hadoop might be
> indirectly encouraging commodity hardware and software to be developed
> which is convenient but might not be very good (also the cost factor is
> unproven with no proper case studies or whitepaper)
>
> It is like the fast food industry, which is very convenient (a commodity)
> but causing obesity all over the world (And hence also causing many
> illness, poor health, social trauma therefore the cost of a burger to
> anyone is actually far more than what a company charges when you eat it)
>
> In effect what Hadoop (and all the other commercial software around it) is
> saying that its ok if you have bad software (Application, JVM, OS), I will
> provide another software which will hide all the problems of yours... We
> might all just go the obesity way in the software industry too
>
>
>
> Hey
>
> Namaskara~Nalama~Guten Tag~Bonjour
>
>
>
>    --
> Keigu
>
> Deepak
> 73500 12833
> www.simtree.net, deepak@simtree.net
> deicool@gmail.com
>
> LinkedIn: www.linkedin.com/in/deicool
> Skype: thumsupdeicool
> Google talk: deicool
> Blog: http://loveandfearless.wordpress.com
> Facebook: http://www.facebook.com/deicool
>
> "Contribute to the world, environment and more :
> http://www.gridrepublic.org
> "
>
> On Sat, May 28, 2016 at 12:51 AM, J. Rottinghuis <jrottinghuis@gmail.com>
> wrote:
>
>> We run several clusters of thousands of nodes (as do many companies), our
>> largest one has over 10K nodes. Disks, machines, memory, and network fail
>> all the time. The larger the scale, the higher the odds that some machine
>> is bad in a given day. On the other hand, scale helps. If a single node our
>> of 10K fails, 9,999 others participate in re-distributing state. Even a
>> rack failure isn't a big deal most of the time (plus typically a rack fails
>> due to a TOR issue, so the data is offline, but typically not lost
>> permanently).
>>
>> Hadoop is designed to deal with this, and by-and-large it does. Critical
>> components (such as Namenodes) can be configured to run in an HA pair with
>> automatic failover. There is quite a bit of work going on by many in the
>> Hadoop community to keep pushing the boundaries of scale.
>>
>> A node or a rack failing in a large cluster actually has less impact than
>> at smaller scale. With a 5-node cluster, if 1 machine crashes you've taken
>> 20% capacity (disk and compute) offline. 1 out of 1K barely registers.
>> Ditto with a 3 rack cluster. Loose a rack and 1/3rd of your capacity is
>> offline.
>>
>> It is large-scale coordinated failure you should worry about. Think
>> several rows of racks coming offline due to power failure, a DC going
>> offline due to fire in the building etc. Those are hard to deal with in
>> software within a single DC. They should also be more rare, but as many
>> companies have experienced, large scale coordinated failures do
>> occasionally happen.
>>
>> As to your question in the other email thread, it is a well-established
>> pattern that scaling horizontally with commodity hardware (and letting
>> software such as Hadoop deal with failures) help with both scale and
>> reducing cost.
>>
>> Cheers,
>>
>> Joep
>>
>>
>> On Fri, May 27, 2016 at 11:02 AM, Arun Natva <arun.natva@gmail.com>
>> wrote:
>>
>>> Deepak,
>>> I have managed clusters where worker nodes crashed, disks failed..
>>> HDFS takes care of the data replication unless you loose too many of the
>>> nodes where there is not enough space to fit the replicas.
>>>
>>>
>>>
>>> Sent from my iPhone
>>>
>>> On May 27, 2016, at 11:54 AM, Deepak Goel <deicool@gmail.com> wrote:
>>>
>>>
>>> Hey
>>>
>>> Namaskara~Nalama~Guten Tag~Bonjour
>>>
>>> We are yet to see any server go down in our cluster nodes in the
>>> production environment? Has anyone seen reliability problems in their
>>> production environment? How many times?
>>>
>>> Thanks
>>> Deepak
>>>    --
>>> Keigu
>>>
>>> Deepak
>>> 73500 12833
>>> www.simtree.net, deepak@simtree.net
>>> deicool@gmail.com
>>>
>>> LinkedIn: www.linkedin.com/in/deicool
>>> Skype: thumsupdeicool
>>> Google talk: deicool
>>> Blog: http://loveandfearless.wordpress.com
>>> Facebook: http://www.facebook.com/deicool
>>>
>>> "Contribute to the world, environment and more :
>>> http://www.gridrepublic.org
>>> "
>>>
>>>
>>
>

Mime
View raw message