hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jay vyas <jayunit100.apa...@gmail.com>
Subject Re: What skills to Learn to become Hadoop Admin
Date Sat, 07 Mar 2015 16:34:09 GMT
Setting up vendor distros is a great first step.

1) Running TeraSort and benchmarking is a good step.  You can also run
larger, full stack hadoop applications like bigpetstore, which we curate
here : https://github.com/apache/bigtop/tree/master/bigtop-bigpetstore/.

2) Write some mapreduce or spark jobs which write data to a persistent
transactional store, such as SOLR or HBase.  This is a hugely important
part of real world hadoop administration, where you will encounter problems
like running out of memory, possibly CPU overclocking on some nodes, and so

3) Now, did you want to go deeper into the build/setup/deployment of hadoop
?  Its worth it  to try building/deploying/debugging hadoop ecosytem
components from scratch, by setting up Apache BigTop, which packages
RPM/DEB artifacts and provides puppet recipes for distributions.  Its the
original roots of both the cloudera and hortonworks distributions, so you
will learn something about both by playing with it.

We have some exersizes you can use to guide you and get started
https://cwiki.apache.org/confluence/display/BIGTOP/BigTop+U%3A+Exersizes .
Feel free to join the mailing list for questions.

On Sat, Mar 7, 2015 at 9:32 AM, max scalf <oracle.blog3@gmail.com> wrote:

> Krish,
> I dont mean to hijack your mail here but i wanted to find out how/what you
> did for the below portion, as i am trying to go down your path as well, i
> was able to get 4-5 node cluster using ambari and cdh and now wanted to
> take it to next level.  What have you done for below?
> "I have done a web log integration using flume and twitter sentiment
> analysis."
> On Sat, Mar 7, 2015 at 12:11 AM, Krish Donald <gotomypc27@gmail.com>
> wrote:
>> Hi,
>> I would like to enter into Big Data world as Hadoop Admin and I have
>> setup 7 nodes cluster using Ambari, Cloudera Manager and Apache Hadoop.
>> I have installed the services like hive, oozie, zookeeper etc.
>> I have done a web log integration using flume and twitter sentiment
>> analysis.
>> I wanted to understand what are the other skills I should learn ?
>> Thanks
>> Krish

jay vyas

View raw message