hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Noelle Jakusz (c)" <njak...@vmware.com>
Subject RE: How to understand Hadoop source code ?
Date Thu, 18 Apr 2013 17:32:05 GMT
+1

There are quite a few new people, so maybe start a collaborative group where you can collect
notes and steps (videos and articles). I know I would have some for you that I have created
as I have gotten started... it would be a great idea to post them after some collaboration
and review.

Thanks Chris for the detailed reply...

-----Original Message-----
From: Chris Nauroth [mailto:cnauroth@hortonworks.com] 
Sent: Thursday, April 18, 2013 1:14 PM
To: common-dev@hadoop.apache.org
Subject: Re: How to understand Hadoop source code ?

Is there a specific bug fix or feature that you are trying to contribute?
 Specific questions like "how can I help with jira X?" or "what is the main entry point when
I run the hdfs command?" or "where does the namenode serialize metadata to disk" or "where
does the secondary namenode execute a checkpoint" can help focus the conversation.

AFAIK, we don't have a general code walkthrough document focused on onboarding new engineers.
 This could be a valuable contribution if you want to gather notes while you learn.  I think
this always works best if it's driven by a new engineer with review by an expert.  (If the
experts write it, then they might accidentally skip something non-obvious that they've already
internalized.)

Since that document doesn't exist yet, the other option is to do some reading of the code,
ideally while trying to fix a specific bug that has been filed in jira.  Like you said, it's
a relatively large codebase, so it's impractical to read the whole thing top-to-bottom.  Instead,
it's important to look for high-level clues that steer you towards the right files.  I've
found that the Maven module structure and the Java package names are usually descriptive enough
to steer me in the right direction.
 If you focus on getting familiar with those, you'll basically build a btree inside your brain
that helps you index into the right part of the codebase and answer your own questions rapidly.
 Several examples:

"Where is the main entry point for the datanode daemon?": module hadoop-hdfs, package org.apache.hadoop.hdfs.server.datanode

"What is the algorithm for rebalancing an unbalanced cluster?": module hadoop-hdfs, package
org.apache.hadoop.hdfs.server.balancer

"How does YARN launch a new container process?": module hadoop-yarn-server-nodemanager, package
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher

"Multiple daemons publish JMX metrics as a common concern.  Where is that
implemented?": module hadoop-common, package org.apache.hadoop.metrics2

I hope this is helpful to get the process started for you.  We're always here to help if you
have specific follow-up questions.

Thanks,
--Chris


On Wed, Apr 17, 2013 at 10:33 PM, Prabakaran Krishnan < prabakaran_j2ee@yahoo.in> wrote:

> Couuld you please help me in understand map reduce in Hadoop?
>
>
>
> ________________________________
> From: Mohammad Mustaqeem <3m.mustaqeem@gmail.com>
> To: common-dev <common-dev@hadoop.apache.org>
> Sent: Thursday, 18 April 2013 10:44 AM
> Subject: Re: How to understand Hadoop source code ?
>
>
> I am interested in HDFS. Please guide me.
>
>
> On Thu, Apr 18, 2013 at 3:36 AM, Arun C Murthy <acm@hortonworks.com>
> wrote:
>
> > Please don't cross post.
> >
> > What parts of Hadoop are you interested in? HDFS? YARN? MapReduce?
> >
> > Arun
> >
> > On Apr 17, 2013, at 2:50 PM, Mohammad Mustaqeem wrote:
> >
> > > Hello everyone,
> > >          I am new to this group. Since the source code of Hadoop 
> > > is
> very
> > > big, I am not able to understand it entirely.
> > > Is there any document that describes the code?
> > > Is there any way to understand the functionality of each classes 
> > > and
> its
> > > method?
> > >
> > >
> > > --
> > > *With regards ---*
> > > *Mohammad Mustaqeem*,
> > > M.Tech (CSE)
> > > MNNIT Allahabad
> >
> > --
> > Arun C. Murthy
> > Hortonworks Inc.
> > http://hortonworks.com/
> >
> >
> >
>
>
> --
> *With regards ---*
> *Mohammad Mustaqeem*,
> M.Tech (CSE)
> MNNIT Allahabad
> 9026604270
>

Mime
View raw message