giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eli Reisman (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (GIRAPH-13) Port Giraph to YARN
Date Tue, 19 Feb 2013 17:13:14 GMT

    [ https://issues.apache.org/jira/browse/GIRAPH-13?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13581447#comment-13581447
] 

Eli Reisman commented on GIRAPH-13:
-----------------------------------

Thanks for your great review Hyunsik, great to hear from you!

I really appreciate your input! You successfully named ALL of my concerns! My biggest is the
IO formats which, as you said, are completely depended on MRv1. Your idea was exactly the
approach I was planning on.

As for your 1. concern, yes this is a draft version and the new one (don't even have a patch
up yet but I will soon to show you) will be completely configurable from the GiraphRunner
CLI options.

for 2. concern: There is a need for history and a number of other basic systems we get from
MRv1 right now. Because of the timing (I am trying to finish this phase before the end of
march) I may attmept to make GIRAPH-13 just cover the following upgrade: a YARN profile for
Giraph, including the ability to run examples/ applications from the Giraph jar-with-dependencies,
on YARN. I hope to make all other "fleshing out" of the features in more separate JIRAs or
subissues. This sort of bounds in the difficulty for this first stage, and enables others
to start working the feature-add JIRA's without having to know all about YARN.

The exciting thing is that the YARN API allows a much finer grained control of a lot of our
BSP process than Hadoop ever did. And I too was thinking, after this a port to Mesos (or wherever)
is going to be really easy! We might as time passes consider moving the launch of our zookeeper
instance into the ApplicationMaster, doing more fine-grained resource allocation control (assign
input splits right at the beginning of the job run, assign hosts to the workers as we choose
for data locality, allot memory and/or cores depending on the size of the splits we assign
etc.) the options really open some doors.

BUT, even to just make the exmaples run, the IO problem must be solved. I do think wrapping
the MRv1 related functions (stuff that needs a TaskAttemptContext or Job-type classes from
Hadoop and more) is the way to go, but I sure appreciate any ideas you might have?

Anyway, I will put up another patch hopefully tonight or tomorrow that is another significant
upgrade from what you saw here so far. All input and ideas are appreciated, thanks again!

                
> Port Giraph to YARN
> -------------------
>
>                 Key: GIRAPH-13
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-13
>             Project: Giraph
>          Issue Type: New Feature
>            Reporter: Jakob Homan
>            Assignee: Eli Reisman
>         Attachments: GIRAPH-13-1.patch, GIRAPH-13-2.patch
>
>
> Now that YARN (aka MR2 aka MAPREDUCE-279) has been merged into the Hadoop trunk, we should
think about what it would take to separate out the graph processing bits of Giraph from the
MR1-specific code so as to take advantage of the less-MR centric aspects of YARN, while still
supporting both over the medium term.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message