giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benjamin Heitmann <benjamin.heitm...@deri.org>
Subject Re: Problem deploying Giraph job to hadoop cluster: onlineZooKeeperServers connection failure
Date Thu, 22 Mar 2012 13:03:09 GMT

Hello, 

so I managed to resolve the issue myself. 

On 21 Mar 2012, at 20:30, Benjamin Heitmann wrote:

> A few questions which come to my mind as a sort of checklist: 
> * are my assembly instructions in pom.xml and in hadoop-job.xml correct ? 

This was the deciding issue. My jar file contained the dependencies as jar files in the lib
dir inside of the job jar. 
While (almost) all Google search results for assembling a hadoop job as a jar suggest that
this is the right way to do it, 
it seems that Giraph or a dependency introduces some changes to the process in which the job
jar is loaded. 

After checking out the giraph-*-jar-with-dependencies.jar (with jar -tf), I saw that all dependency
jars are unpacked in there.
I copied and modified the relevant invocation of the maven assembly plugin to my project pom.xml
and built that jar (with mvn clean assembly:assembly). 

Then I submitted that jar to hadoop. Using bin/giraph failed (an error about not being able
to write using the output format.) 

However, bypassing bin/giraph and telling hadoop to run my subclass of Tool via ToolRunner
worked. 
I submitted the changes to pom.xml to the github repo if anybody wants to have a look. 
 https://github.com/2nd-metaman/sa-rdf-giraph

So my problem of not being able to run my giraph job on a hadoop cluster *at all* is solved
for now. 


The error which I had when trying bin/giraph was reproducible in the same environment for
the PageRankeBenchmark. 
I can file an issue for that later, if somebody else can reproduce that. 


In addition, I would strongly suggest making a maven archetype for a simple giraph job. 

I will start a new email thread for that. 

cheers, Benjamin. 




Mime
View raw message