giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jakob Homan <jgho...@gmail.com>
Subject Re: Suggestion: Maven archetype for simple giraph job
Date Thu, 22 Mar 2012 19:39:18 GMT
This is a great idea.  Let's make it happen!
-jg


On Thu, Mar 22, 2012 at 6:14 AM, Benjamin Heitmann
<benjamin.heitmann@deri.org> wrote:
> Hello,
>
> after my experiences with giraph and hadoop in the last weeks, I would strongly suggest
that a maven archetype for a simple giraph job
> should be made available for new developers.
>
> Figuring out how to change the provided giraph examples, in order to make them error
free in an IDE,
> and then how to run a unit test and a InternalVertexRunner is manageable.
>
> However deploying that same code to a real hadoop cluster can be very time consuming
and frustrating.
>
> There is a strong chance that a few people from my research unit will also need to learn
about giraph and hadoop,
> and providing a maven archetype  is the way in which I would document my experiences
for them.
>
>
> For that archetype I would suggest the following contents:
> * pom.xml which has dependencies to hadoop, and which specifies the assembly instructions
for a jar that hadoop can use
> (not ./lib as everybody on the web says, but unpcked jars in / )
> * empty vertex class which is a subclass of HashMapVertex (with comments to explain that
other classes like BasicVertex should never be subclassed by the user)
> * empty TextInputFormat
> * empty TextOutputFormat
> * empty class with run() and ToolRunner invocation, and comments to explain that this
is an alternative to bin/giraph, and how to use bin/giraph for the same effect
> (also explain the more advanced things which a custom run() can do)
> * make sure that all classes can be called through bin/giraph as well (and debug GiraphRunner
if there still is some error)
> * empty Test class using internalvertexrunner
> * everything should be able to run via the Test, the ToolRunner or bin/giraph just without
doing anything.
>
> I also consider this a good opportunity to learn about the best practices of using giraph,
> and I think that I can probably work on that archetype in April.
>
> The archetype would be based on a cleaned up and domain/use-case agnostic version of
my code which is currently here:
>  https://github.com/2nd-metaman/sa-rdf-giraph
>
> I am not sure how that would be distributed, probably using the same infrastructure
> which is required for distributing an giraph maven artefact to the apache maven servers
anyway.
>
> Please let me know if you as the giraph community thinks this is a good idea,
> and if you have additions and/or changes to what should go inside of the archetype.
>
>
> cheers, Benjamin.

Mime
View raw message