giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eli Reisman (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (GIRAPH-503) Refactor platform-independent CLI argument parsing in GiraphRunner into a separate class
Date Fri, 08 Feb 2013 00:37:13 GMT

     [ https://issues.apache.org/jira/browse/GIRAPH-503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Eli Reisman updated GIRAPH-503:
-------------------------------

    Attachment: GIRAPH-503-4.patch

Great reviews on RB guys, thanks again. I have rethought a few things about this refactor,
and am posting a new patch (and new diff on RB) for you to look at.

I ran this new patch on a real cluster job and through mvn verify and it works fine, including
"bad config CLI" jobs and "good config" jobs. Yay!

I totally see what you're saying about the abundance of Runner code. The fact is, I have now
(I think) factored about as much of the platform-neutral BSP/Giraph arg parsing as I can into
the CliParserUtils class, and have made GiraphRunner very very short. I have also added an
option where any Runner class (your HiveRunner, my soon-to-be YarnRunner) can easily add options
to the CliParserUtils before it does its parse and GiraphConfiguration gets populated with
the results.

This gives us time to deal with the many GiraphRunners at a later date. Fact is, I will have
to really gut out GiraphJob and GiraphRunner in order to make them any less Hadoop dependent
than they are now. That really should happen in another JIRA. This refactor hopefully just
sets us up to make that easier when it happens.

@Alessandro: I might have found a bug in GiraphJob. When we call getInternalJob we set the
flag (one time only) "jobInited". I think this flag is a one-off so that only the internal
Hadoop call to getConfiguration() will return the one from the the JobConf, and henceforth
we get back our own giraphConfiguration. Thing is, now with the new edge/vertex input formats,
we are calling getInternalJob several times in a row to set up GiraphFileInputFormat (etc)
and I think we are tripping the flag too soon (i.e. before hadoop ever gets a look at our
JobConf)

1. is this true?

2. if we are not supplying the flag for Hadoop's benefit, than for who? Can it be eliminated?
Again, we are setting it off several times real early in the job setup.

Thanks! I can start another JIRA for that issue if its a real issue.

                
> Refactor platform-independent CLI argument parsing in GiraphRunner into a separate class
> ----------------------------------------------------------------------------------------
>
>                 Key: GIRAPH-503
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-503
>             Project: Giraph
>          Issue Type: Improvement
>            Reporter: Eli Reisman
>            Assignee: Eli Reisman
>            Priority: Minor
>         Attachments: GIRAPH-503-1.patch, GIRAPH-503-2.patch, GIRAPH-503-3.patch, GIRAPH-503-4.patch
>
>
> In order to run on non Hadoop MR platforms, we will need to populate the GiraphConfiguration
for our job in a platform-independent way so that all config options are available to whatever
driver class initiates the Giraph job (not just GiraphRunner/GiraphJob.) This also serves
to clean up GiraphRunner in general.
> Passes 'mvn clean install'
> Review Board URL: https://reviews.apache.org/r/9350/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message