giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eli Reisman (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (GIRAPH-13) Port Giraph to YARN
Date Tue, 19 Feb 2013 17:39:15 GMT

    [ https://issues.apache.org/jira/browse/GIRAPH-13?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13581473#comment-13581473
] 

Eli Reisman commented on GIRAPH-13:
-----------------------------------

Hey one more idea to throw out there regarding all the IO format issues with YARN, what do
you think of this:

Since some of our internals are prettty bound up in some MRv1 classes, we can do the refactor
and wrapping already spoken about above to hide this dependency. Another approach I might
explore is to simply have a generic task runner (that owns GraphTaskManager, and replaces
GraphMapper in our YARN impl) that just instantiates the TaskAttemptContext and other Hadoop
MRv1 classes and populates them with the info they need to run the job (taken from the giraphConfiguration
and/or the YARN classes that report some of the same data to the running job) and just hand
those off to our Giraph code that expects these objects. Since this activity is self-contained
in the runner class, no platform-dependent setup code (for YARN, mesos, whoever) has to know
anything about the runner, just create it and hand it the data it needs, set it to running
on the right compute nodes, etc.

This is a tiny bit hacky, but gets the job done with minimal changes to existing code, allows
for future JIRAs to do more extensive refactors, and does not hide from the fact that we will
still carry dependencies on the Hadoop JARs for as long as we support MRv1 too, so we will
have access to these classes to instantiate even on Mesos or YARN. I am not entirely sure
this approach is possible but its one I have toyed with as an alternative to doing the full
"wrap all MRv1 IO objects" approach.

Any opinions? I will be exploring the options for the IO dilemma in great detail later in
the week and will post my findings/opinions as I survey the landscape. Just need to get the
rest of the Yarn job setup code done today and post that patch first...


                
> Port Giraph to YARN
> -------------------
>
>                 Key: GIRAPH-13
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-13
>             Project: Giraph
>          Issue Type: New Feature
>            Reporter: Jakob Homan
>            Assignee: Eli Reisman
>         Attachments: GIRAPH-13-1.patch, GIRAPH-13-2.patch
>
>
> Now that YARN (aka MR2 aka MAPREDUCE-279) has been merged into the Hadoop trunk, we should
think about what it would take to separate out the graph processing bits of Giraph from the
MR1-specific code so as to take advantage of the less-MR centric aspects of YARN, while still
supporting both over the medium term.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message