hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wilm Schumacher <wilm.schumac...@gmail.com>
Subject Re: New to this group.
Date Fri, 02 Jan 2015 20:19:21 GMT
So if I could propose something

If this is your desire, I would recommend something like a
»adminstration handbook for the "hadoop world"«.

I'm hoping that I'm not stepping on someones toes, but the documentation
for running clusters of different hadoop related systems of the
ecosystem could be better. As an example: Some parts of the hadoop
ecosystem highly depend on name resolution, thus it is very important to
know what you are doing with dns vs. /etc/hosts (dammit ubuntu for!!! *waving fist at canonical*) etc., but problems with name
resolution are not well covered in the documentation but just mentioned.
Another example would be the running of independent zoekeeper clusters
and integrating them into systems is not well covered. I'm always using
standard ports and hope for the best :/. All this "real world" stuff.

So a "cookbook" starting from zero to fancy cluster with hdfs, yarn,
hbase, storm, pig and phoenix (which should cover 95% of all users)
which cover not only the hadoop stuff but the other "administration"
concerns (dns, kerberos, rsync, firewall [bsds pf, linux' iptables])
would be great.

Furthermore hadoop clusters can be tweaked and the performance can be
improved drastically by good adminstration. If you would use example
applications (like "twitbase" from the book "hbase in action" which is
basically a twitter clone, or the weather data set stuff from "hadoop
def. guide") and tweak the cluster to top performance, so others could
look at the example applications and could use them as "schemes" for
their own purpose and could tweak the cluster accordingly, would be very
good. As there are a lot of examples out there, this would be a great
help for people trying to adjust their cluster, as there would be a
starting point for similar examples.

This is in the "adminstration sphere" as you wish, new users could
benefit, it makes cluster set-up painless which would help to improve
the acceptance for the hadoop ecosystem, as you are a nice guy ;) and
use a good licence the official documentation would benefit, and this
would be something great for a resumee.

Best wishes and looking forward to »Krishs hadoop cookbook« ;)


Am 02.01.2015 um 20:43 schrieb Krish Donald:
> I would like to go towards administration side not in development side
> as I don't know java at all...
> On Fri, Jan 2, 2015 at 11:37 AM, Jay Vyas <jayunit100.apache@gmail.com
> <mailto:jayunit100.apache@gmail.com>> wrote:
>     Many demos out there are for the business community... 
>     For a demonstration of hadoop at a finer grained level, how it's
>     deployed, packaged, installed and used, for a developer who wants
>     to learn hadoop "the hard way",  
>     I'd suggest :
>     1 - Getting Apache bigtop stood up on VMs, and 
>     2 - running the BigPetStore application , which is meant to
>     demonstrate end to end building testing and deployment of a hadoop
>     batch analytics system with mapreduce, pig, and mahout.  
>     This will also expose you to puppet, gradle, vagrant, all in a big
>     data app which solves Real world problems like jar dependencies
>     and multiple ecosystem components.
>     Since BPS generates its own data, you don't  waste time worrying
>     about external data sets, Twitter credentials, etc, and can test
>     both on your laptop and on a 100 node cluster (similar to teragen
>     but for the whole ecosystem).
>     Since it features integration tests and tested on Bigtops hadoop
>     distribution,  (which is 100% pure Apache based), it's imo the
>     purest learning source, not blurred with company specific
>     downloads or branding.
>     Disclaimer : Of course I'm biased as I work on it... :)  but we've
>     been working hard to make bigtop easily consumable as a gateway
>     drug to bigdata processing, and if you have solid linux and Java
>     background, im sure others would agree it's great place to get
>     immersed in the hadoop ecosystem.
>     On Jan 2, 2015, at 1:05 PM, Krish Donald <gotomypc27@gmail.com
>     <mailto:gotomypc27@gmail.com>> wrote:
>>     I would like to work on some kind of case studies like I have
>>     seen couple on Horton works like twitter sentiment analysis, web
>>     log analysis etc.
>>     But if somebody can give idea about other case studies which can
>>     be worked upon and can be put in resume later .
>>     As I don't have real time project experience.
>>     On Fri, Jan 2, 2015 at 10:33 AM, Ted Yu <yuzhihong@gmail.com
>>     <mailto:yuzhihong@gmail.com>> wrote:
>>         You can search for Open JIRAs which are related to admin.
>>         Here is an example query:
>>         https://issues.apache.org/jira/browse/HADOOP-9642?jql=project%20%3D%20HADOOP%20AND%20status%20%3D%20Open%20AND%20text%20~%20%22admin%22
>>         <https://issues.apache.org/jira/browse/HADOOP-9642?jql=project%20%3D%20HADOOP%20AND%20status%20%3D%20Open%20AND%20text%20%7E%20%22admin%22>
>>         FYI
>>         On Fri, Jan 2, 2015 at 10:24 AM, Krish Donald
>>         <gotomypc27@gmail.com <mailto:gotomypc27@gmail.com>> wrote:
>>             I have fair understanding of hadoop eco system...
>>             I have setup multinode cluster using VMs in my personal
>>             laptop for Hadoop 2.0 .
>>             But beyond that i would like to work on some project to
>>             get a good hold on the subject.
>>             I basically would like to go to into Hadoop
>>             Administartion side as my backgroud is RDBMS databases
>>             Admnistrator .
>>             On Fri, Jan 2, 2015 at 10:11 AM, Wilm Schumacher
>>             <wilm.schumacher@gmail.com
>>             <mailto:wilm.schumacher@gmail.com>> wrote:
>>                 Hi,
>>                 the "standard" books may be a good start:
>>                 I liked the following
>>                 definitive guide:
>>                 http://www.amazon.de/Hadoop-Definitive-Guide-Tom-White/dp/1449311520
>>                 hadoop in action:
>>                 http://www.manning.com/lam2/
>>                 hadoop in practive:
>>                 http://www.manning.com/holmes2/
>>                 A list is here:
>>                 http://wiki.apache.org/hadoop/Books
>>                 Hope this helps.
>>                 Best wishes,
>>                 Wilm
>>                 Am 02.01.2015 um 19:02 schrieb Krish Donald:
>>                 > Hi,
>>                 >
>>                 > I am new to this group and hadoop.
>>                 > Please help me to learn hadoop and suggest some
>>                 self study project .
>>                 >
>>                 > Thanks
>>                 > Krish Donald

View raw message