hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Raj V <rajv...@yahoo.com>
Subject Re: risks of using Hadoop
Date Wed, 21 Sep 2011 22:06:17 GMT
I have been following this thread. Over the last two years that I have been using hadoop with
a fairly large cluster, my biggest problem has been analyzing failures. In the beginning it
was fairly simple - unformatted name node, task  trackers not starting , heap allocation
mistakes version id mismatch  configuration mistakes, that were easily fixed using this group's
help and or analyzing logs. Then the errors got a little more complicated - "too many fetch
failures, task exited with error code 134, error reading task output, etc"  where the logs
were less useful this mailing list and the source became more useful - and given that I am
not a Java expert, I needded to rely on this group more and more.  There are wonderful people
like Harsha, Steve and Todd who sincerely and correctly answer many queries. But this is a
complex system  are so many knobs and so many variables that knowing all possible failures
is a probably close to impossible.  This
 is just the framework. If you combine this with all the esoteric industries that hadoop is
used for the complexity increases because of the domain expertise required. 

We won't even touch the voodoo magic that is involved in optimizing hadoop runs. 

So to mitigate the risk of running hadoop you need someone with  four heads. - the domain
head - one who can think and solve domain problems, the hadoop head- the person to translate
this into M/R. The java head who understands java and can take a shot at looking at the source
code and finding solutions to problems and the system head , the person who keeps the cluster
buzzing along smoothly. So unless you have these heads or able to get these heads as required
- there is some definite risk. 

Thanks once again to this wonderful group and many active people like Todd, Harsha , Steve
and many others who have helped me and others go over that stumbling block., 











>________________________________
>From: Ahmed Nagy <ahmed.nagy@gmail.com>
>To: common-user@hadoop.apache.org
>Sent: Wednesday, September 21, 2011 2:02 AM
>Subject: Re: risks of using Hadoop
>
>Another way to decrease the risks is just to use Amazon Web Services. That
>might be a bit expensive
>
>On Sun, Sep 18, 2011 at 12:11 AM, Brian Bockelman <bbockelm@cse.unl.edu>
>wrote:
>>
>>
>> On Sep 16, 2011, at 11:08 PM, Uma Maheswara Rao G 72686 wrote:
>>
>> > Hi Kobina,
>> >
>> > Some experiences which may helpful for you with respective to DFS.
>> >
>> > 1. Selecting the correct version.
>> >    I will recommend to use 0.20X version. This is pretty stable version
>and all other organizations prefers it. Well tested as well.
>> > Dont go for 21 version.This version is not a stable version.This is
>risk.
>> >
>> > 2. You should perform thorough test with your customer operations.
>> >  (of-course you will do this :-))
>> >
>> > 3. 0.20x version has the problem of SPOF.
>> >   If NameNode goes down you will loose the data.One way of recovering is
>by using the secondaryNameNode.You can recover the data till last
>checkpoint.But here manual intervention is required.
>> > In latest trunk SPOF will be addressed bu HDFS-1623.
>> >
>> > 4. 0.20x NameNodes can not scale. Federation changes included in latest
>versions. ( i think in 22). this may not be the problem for your cluster.
>But please consider this aspect as well.
>> >
>>
>> With respect to (3) and (4) - these are often completely overblown for
>many Hadoop use cases.  If you use Hadoop as originally designed (large
>scale batch data processing), these likely don't matter.
>>
>> If you're looking at some of the newer use cases (low latency stuff or
>time-critical processing), or if you architect your solution poorly (lots of
>small files), these issues become relevant.  Another case where I see folks
>get frustrated is using Hadoop as a "plain old batch system"; for non-data
>workflows, it doesn't measure up against specialized systems.
>>
>> You really want to make sure that Hadoop is the best tool for your job.
>>
>> Brian
>
>
>
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message