incubator-giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Avery Ching <ach...@apache.org>
Subject Re: Suggestion: Maven archetype for simple giraph job
Date Sat, 24 Mar 2012 07:06:03 GMT
Sounds good to me.  I would use EdgeListVertex as the parent class 
instead of HashMapVertex (saves memory).

Avery

On 3/22/12 12:39 PM, Jakob Homan wrote:
> This is a great idea.  Let's make it happen!
> -jg
>
>
> On Thu, Mar 22, 2012 at 6:14 AM, Benjamin Heitmann
> <benjamin.heitmann@deri.org>  wrote:
>> Hello,
>>
>> after my experiences with giraph and hadoop in the last weeks, I would strongly suggest
that a maven archetype for a simple giraph job
>> should be made available for new developers.
>>
>> Figuring out how to change the provided giraph examples, in order to make them error
free in an IDE,
>> and then how to run a unit test and a InternalVertexRunner is manageable.
>>
>> However deploying that same code to a real hadoop cluster can be very time consuming
and frustrating.
>>
>> There is a strong chance that a few people from my research unit will also need to
learn about giraph and hadoop,
>> and providing a maven archetype  is the way in which I would document my experiences
for them.
>>
>>
>> For that archetype I would suggest the following contents:
>> * pom.xml which has dependencies to hadoop, and which specifies the assembly instructions
for a jar that hadoop can use
>> (not ./lib as everybody on the web says, but unpcked jars in / )
>> * empty vertex class which is a subclass of HashMapVertex (with comments to explain
that other classes like BasicVertex should never be subclassed by the user)
>> * empty TextInputFormat
>> * empty TextOutputFormat
>> * empty class with run() and ToolRunner invocation, and comments to explain that
this is an alternative to bin/giraph, and how to use bin/giraph for the same effect
>> (also explain the more advanced things which a custom run() can do)
>> * make sure that all classes can be called through bin/giraph as well (and debug
GiraphRunner if there still is some error)
>> * empty Test class using internalvertexrunner
>> * everything should be able to run via the Test, the ToolRunner or bin/giraph just
without doing anything.
>>
>> I also consider this a good opportunity to learn about the best practices of using
giraph,
>> and I think that I can probably work on that archetype in April.
>>
>> The archetype would be based on a cleaned up and domain/use-case agnostic version
of my code which is currently here:
>>   https://github.com/2nd-metaman/sa-rdf-giraph
>>
>> I am not sure how that would be distributed, probably using the same infrastructure
>> which is required for distributing an giraph maven artefact to the apache maven servers
anyway.
>>
>> Please let me know if you as the giraph community thinks this is a good idea,
>> and if you have additions and/or changes to what should go inside of the archetype.
>>
>>
>> cheers, Benjamin.


Mime
View raw message