incubator-hama-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apurv Verma <dapu...@gmail.com>
Subject Re: Absolute Newbie
Date Sat, 12 Nov 2011 20:53:50 GMT
Hii,
 The hama community is really very helpful. I just thought to write back
notifying that reading and understanding all the links is taking some time.
I have understood the basic overview of Hama and BSP.
Basically here is what I have understood.

There is a a BSPMaster like a master taking all the decisions, how to
schedule.etc.
Then there are slaves or GroomServers which do the tasks.
Then there is a zookeeper to do the barrier synchronization.

Here is what my question is?
Normally when you we parallelize an algorithm we split it into many threads
and then combine the answers returned by them in the master, so isn't the
zookeeper a part of the master only. why is it separate here? Don't the
GroomServers return the results to BSPMaster and the BSPMaster combines
them. Where does zookeeper fit in here?


Now I am trying to understand HDFS and how the parallel graph search
algorithm which is given as a example in this presentation [0] works.
I will get back as soon as I do these.

[0]
http://www.slideshare.net/guest20d395b/apache-hama-an-introduction-tobulk-synchronization-parallel-on-hadoop
--
thanks and regards,

Apurv Verma
B. Tech.(CSE)
IIT- Ropar






On Fri, Nov 11, 2011 at 7:31 PM, Thomas Jungblut <
thomas.jungblut@googlemail.com> wrote:

> Hey,
>
> thanks for your interest, it is currently a bit chaotic and not well
> documented, but that's open source ;))
> I answer your questions one by one.
>
> 1.  I am an absolute newbie to Hama and Hadoop. Should I learn hadoop
> >   first before I can begin contributing to this project?
>
>
> We officially just use HDFS, so it is enough if you're familiar with the
> FileSystem API. [1]
> This includes that you are familiar with the Writable interface[2], which
> lets you serialize and deserialize objects.
>
>  2. I don't exactly understand how hama works and what it is. All I
> >   understand is that it's a graph library written over a distributed
> >   architecture hadoop.
> >   Where can get to know the basics of the hama, as I have already stated
> >   before that do I also need to learn hadoop?
>
>
> It is not nessacarely a graph library, we are a BSP (Bulk Synchronous
> Parallel) Framework. You can familiarize with BSP by reading the wikipedia
> article [3]
> However, you can solve graph problems with it as well as matrix operations
> or other fancy stuff like real time processing streams.
> Like in the last question, you don't need to understand MapReduce (I guess
> that's what you mean by Hadoop in this case) to understand BSP, but once
> you understand BSP, you will understand MapReduce. Hope you get the
> directions ;)
>
>  3. On the getting started page, instructions are given with Maven and
> >   SVN.
> >   I have experience with git and not these. I found that the mirror
> github
> >   repository and have forked it and would be working through it only. Is
> it
> >   OK?
>
>
> We work with patches (which are unified diffs), this will also work with
> git. Sadly you can't skip maven, this will be a must-have.
> If you are targetting to be a long-term committer, no matter what project
> at Apache, you will have to know how to use SVN.
> Git is only a read-only repository and will be constantly mirrored from
> SVN.
> SVN is really easy, in my opinion easier than git, so this won't be a
> problem.
>
>  4. To begin with I found this issue for newbies. HAMA-469.
> >   https://issues.apache.org/jira/browse/HAMA-469
> >   It says that statusUpdate() method should be called finally. So what I
> >   can see that there is a
> >   umbilical.statusUpdate(taskId, currentTaskStatus);
> >   I will put it in finally block. I dont understand what this piece of
> >   code wants to do. Basically what i have understood there is a cyclic
> >   barrier kind of thing so as to create a rendezvous for many threads.
> Some
> >   messages are combined and the function returns. I am still lost at
> >   understanding the codebase.
>
>
> Great you've already found our issue tracker and the newbie issues.
> Sadly the description does not cover everything, e.G. motivation and stuff.
> A quick explanation is: If failure in the sync method occured, we want to
> update the "umbilical", so that it knows that the sync has failed.
> Adding a finally block is not the right way, you should take a look at the
> catch clause.
> There is only a error log, but we want to make the status update in this
> clause and make the process fail = throw a runtime exception.
>
> Once reading the wikipedia article I hope you know what the sync method
> should do (send messages!).
> This isn't the whole story yet, but I think you can explore for yourself by
> debugging a bit.
>
>  5. I also found that I can apply to apache for a mentor. Here is my
> >   skillset [0] and I wish to become a long term contributor to projects
> >   centred around hadoop.
> >   [0] http://in.linkedin.com/in/apurv5
> >   I am really looking forward to becoming a full fledged contributor in a
> >   span of six months.
>
>
> Nice CV, but it is enough if you can code in Java and are creative in
> finding solutions. And actually making them run as well.
> I'm not sure if I can mentor you, but I guess we are all able to help you
> once you'll facing a problem.
> Just ask on the mailing list or mail me directly ;)
>
> Hope I clarified a few things. Looking forward to hear from you!
>
> Thomas
>
> [1]
>
> http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/fs/FileSystem.html
>
> [2]
>
> http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/io/Writable.html
> [3] http://en.wikipedia.org/wiki/Bulk_synchronous_parallel
>
> 2011/11/11 Apurv Verma <dapurv5@gmail.com>
>
> > Hii all,
> >
> >
> >   1.  I am an absolute newbie to Hama and Hadoop. Should I learn hadoop
> >   first before I can begin contributing to this project?
> >
> >   2. I don't exactly understand how hama works and what it is. All I
> >   understand is that it's a graph library written over a distributed
> >   architecture hadoop.
> >   Where can get to know the basics of the hama, as I have already stated
> >   before that do I also need to learn hadoop?
> >
> >   3. On the getting started page, instructions are given with Maven and
> >   SVN.
> >   I have experience with git and not these. I found that the mirror
> github
> >   repository and have forked it and would be working through it only. Is
> it
> >   OK?
> >
> >   4. To begin with I found this issue for newbies. HAMA-469.
> >   https://issues.apache.org/jira/browse/HAMA-469
> >   It says that statusUpdate() method should be called finally. So what I
> >   can see that there is a
> >
> >   umbilical.statusUpdate(taskId, currentTaskStatus);
> >   I will put it in finally block. I dont understand what this piece of
> >   code wants to do. Basically what i have understood there is a cyclic
> >   barrier kind of thing so as to create a rendezvous for many threads.
> Some
> >   messages are combined and the function returns. I am still lost at
> >   understanding the codebase.
> >
> >
> >   5. I also found that I can apply to apache for a mentor. Here is my
> >   skillset [0] and I wish to become a long term contributor to projects
> >   centred around hadoop.
> >   [0] http://in.linkedin.com/in/apurv5
> >   I am really looking forward to becoming a full fledged contributor in a
> >   span of six months.
> >
> >
> >
> > --
> > thanks and regards,
> >
> > Apurv Verma
> > B. Tech.(CSE)
> > IIT- Ropar
> >
>
>
>
> --
> Thomas Jungblut
> Berlin <thomas.jungblut@gmail.com>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message