giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benjamin Heitmann <benjamin.heitm...@deri.org>
Subject Suggestion: Maven archetype for simple giraph job
Date Thu, 22 Mar 2012 13:14:28 GMT
Hello, 

after my experiences with giraph and hadoop in the last weeks, I would strongly suggest that
a maven archetype for a simple giraph job 
should be made available for new developers. 

Figuring out how to change the provided giraph examples, in order to make them error free
in an IDE, 
and then how to run a unit test and a InternalVertexRunner is manageable. 

However deploying that same code to a real hadoop cluster can be very time consuming and frustrating.


There is a strong chance that a few people from my research unit will also need to learn about
giraph and hadoop, 
and providing a maven archetype  is the way in which I would document my experiences for them.



For that archetype I would suggest the following contents: 
* pom.xml which has dependencies to hadoop, and which specifies the assembly instructions
for a jar that hadoop can use 
(not ./lib as everybody on the web says, but unpcked jars in / ) 
* empty vertex class which is a subclass of HashMapVertex (with comments to explain that other
classes like BasicVertex should never be subclassed by the user) 
* empty TextInputFormat
* empty TextOutputFormat
* empty class with run() and ToolRunner invocation, and comments to explain that this is an
alternative to bin/giraph, and how to use bin/giraph for the same effect
(also explain the more advanced things which a custom run() can do) 
* make sure that all classes can be called through bin/giraph as well (and debug GiraphRunner
if there still is some error) 
* empty Test class using internalvertexrunner 
* everything should be able to run via the Test, the ToolRunner or bin/giraph just without
doing anything. 

I also consider this a good opportunity to learn about the best practices of using giraph,

and I think that I can probably work on that archetype in April. 

The archetype would be based on a cleaned up and domain/use-case agnostic version of my code
which is currently here: 
 https://github.com/2nd-metaman/sa-rdf-giraph

I am not sure how that would be distributed, probably using the same infrastructure
which is required for distributing an giraph maven artefact to the apache maven servers anyway.


Please let me know if you as the giraph community thinks this is a good idea, 
and if you have additions and/or changes to what should go inside of the archetype. 


cheers, Benjamin. 
Mime
View raw message