hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From André Hacker <>
Subject Re: Learn Java for Hadoop
Date Fri, 15 Aug 2014 19:09:34 GMT
Hi Saurabh,

I am not aware of a tutorial for Java specifically for Hadoop, but I can
give you a (incomplete) shortlist of things you should at least know about,
based on my own experience with Hadoop related Java development. Google
will recommend you the best tutorials for each topic.

1) There is no way around learning the Java fundamentals. Make sure you
understand the basics of inheritance, interfaces, generics, and collections
(List, Map).
It is also beneficial to understand the classpath concept and how to build
with javac.

2) Get familiar with any IDE and built/dependency management tool, e.g.
Eclipse/IntelliJ and Maven. Git is also useful.
As an exercise, try to build a project like Storm from sources (using
github/Maven). Also try to import it in the IDE as a Maven project and
browse around the sources just for fun.

3) The strength of Java is its ecosystem and the thousands of libraries
available. You should budget some time for learning some of them.
As an exercise, try to use a logging library like slf4j/log4j. Or play
around with something more Hadoop related like Avro/Thrift.

4) Start with the examples
There are many example or starter-applications out there for Kafka, Storm,
Mahout, etc. You could also start with old-school MapReduce Wordcount.

IMHO the best way to learn a new technology is to think of a small but real
application that YOU would like to have and then develop it step by step.
The specific user group will assist you for sure.

Have fun,


2014-08-15 10:26 GMT+02:00 Db-Blog <>:

> Hey There,
> Thanks for suggesting the below mentioned links however I am aware of how
> hadoop works and referred the below links in detail since my inception with
> Hadoop. My apologies if my earlier email wasn't clear enough to explain my
> problem statement.
> Staring Fresh again!
> I have experience in hadoop and worked on Bare metal and cloud
> implementations of big data e.g. Cloudera HD, Hortonworks HD and Amazon
> EMR's. During this affair I got a chance to explore Hive, Impala, Sqoop and
> Pig in detail and processed large data sets residing in HDFS. Also enjoyed
> playing with Shell Scripts to automate commands and orchestrate processes.
> All this was batch processing and majorly related to SQL.
> Now I want to move with Real-Time implementations and other technologies
> (mentioned in trailing mails); which definitely need Java Expertise.
> I am seeking guidance to learn specific java topics which will be needed
> for Hadoop only! Links/Posts/courses on the same will be really helpful.
> I also look forward to contribute and share my knowledge to the community.
> :)
> Thanks,
> Saurabh
> On 15-Aug-2014, at 5:09 am, Nishant Kelkar <> wrote:
> Hi Saurabh,
> Welcome to the world of Apache Hadoop! Here are a few good places to
> start:
> 1. Apache Hadoop Definitive Guide book:
> (you could find a free
> e-copy if you Google some :) )
> 2. Hadoop Javadocs:
> 3. If you want to install Hadoop on your local, Noll's tutorial on how to
> do so for a pseudo-distributed mode is really nice:
> 4. The way I started, is by experimenting with Hadoop on my Linux box
> terminal. You should definitely try out basic operations, like adding a
> file to HDFS from your local filesystem, copying a file from HDFS to your
> local, looking at filesystem size, moving files around in HDFS, etc. Here's
> where you can start:
> In general, I think you should also look at blogs/posts that help you
> distinguish Java from the other languages you've used (like HiveQL for
> example). How is Java different from C++? What is the difference between a
> declarative programming language and an object-oriented programming
> language? How does Java create objects? How does it manage them, and
> dispose of them? These are the questions you want to look into first, even
> before starting to write code in Java.
> Welcome to the group once again, and hope you'll be able to start
> contributing to the open-source community real quick! :)
> Best Regards,
> Nishant Kelkar
> On Thu, Aug 14, 2014 at 3:27 PM, Db-Blog <> wrote:
>> Greetings to everyone.
>> I am a newbie in Java and seeks guidance in learning "Java specifically
>> required for Hadoop". It will be really helpful if someone can pass on the
>> links/topics/online-courses which can be helpful to get started on it.
>> I come from ETL & DB- SQL background and currently working on
>> Hive/Impala/Pig/Sqoop since couple of years.
>> I have done some research on other tools of Big Data and Java will be
>> required in depth. Below is the list of tools analysed :
>> - Real time processing  (Apache Kafka and  Storm)
>> - Advance Searching (Solr/Lucene)
>> - Machine learning (Apache Mahout)
>> Please feel free to comment if I am off-base on anything.
>> Kindly suggest regarding the same and thanks for going thru the post and
>> providing your valuable time.
>> Thanks,
>> Saurabh

View raw message