hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marcos Ortiz <mlor...@uci.cu>
Subject Re: Hadoop problems
Date Sun, 17 Feb 2013 05:42:50 GMT
In the next ApacheCon, Kathleen Ting, one of Cloudera´s Custome 
Operations Engineer will
give a talk related to this topic. I don´t have the exact link right 
now, but you can easily find it looking in the Big Data track of the 
conference. She did another similar talk in the Hadoop World 2011. You 
can see it here[1]

Then, you should use "Hadoop Operations" book, written by Eric Sammer,
Engineering Manager at Cloudera and an expert in all this stuff.

Both guys talk always about that Clusters misconfiguration is the 
primary cause of
cluster failures. Like you said, disk failure is a possible cause too, 
but there are more:
- Disk full
- Too many open files for a particular user
- JVM and GC related issues
- Use of OpenJDK VM instead Oracle Java VM
- NTP synhcronization issues
- SSH related issues
- and many more
[1] http://bit.ly/cloudera_talk

  Best wishes
El 16/02/2013 23:18, Henjarappa, Savitha escribió:
> All,
> What are the most common problems that an Hadoop Administrator should 
> be on top of?
> What would be the possible reasons for a job failure? I understand 
> disk failure is one of the reason.
> Thanks,
> Savitha

-- Marcos Ortíz Valmaseda
Product Manager && Data Scientist at UCI
Blog: http://marcosluis2186.posterous.com
LinkedIn: http://www.linkedin.com/in/marcosluis2186
Twitter: @marcosluis2186 <https://twitter.com/marcosluis2186>

View raw message