incubator-giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Avery Ching <ach...@apache.org>
Subject Re: maven, hadoop, zookeeper, and giraph!
Date Fri, 17 Feb 2012 03:07:58 GMT
Hi Jeffrey,

Best attempt as answers inline.

On 2/16/12 6:12 PM, Jeffrey Yunes wrote:
> Hi Giraph community,
> I think I followed all of the directions (for a Giraph on a psuedo-cluster), and it looks
like
>
>> mvn clean test -Dprop.mapred.job.tracker=localhost:9001
> runs fine. However, I'm new to the Hadoop infrastructure, and have a couple of questions
about getting started with Giraph.
>
> 1)
>> hadoop jar target/giraph-0.2-SNAPSHOT-jar-with-dependencies.jar org.apache.giraph.benchmark.PageRankBenchmark
-e 1 -s 3 -v -V 50 -w 3
> gives me the error "java.lang.NullPointerException at at org.apache.giraph.benchmark.PageRankBenchmark.run(PageRankBenchmark.java:127)"
It looks like some error with configuration?

This is a bug.  I have a quick fix for it.  Sorry about that.  I opened 
an issue for it.  https://issues.apache.org/jira/browse/GIRAPH-150

diff --git 
a/src/main/java/org/apache/giraph/benchmark/PageRankBenchmark.java b/
index 0e76122..4d08929 100644
--- a/src/main/java/org/apache/giraph/benchmark/PageRankBenchmark.java
+++ b/src/main/java/org/apache/giraph/benchmark/PageRankBenchmark.java
@@ -124,7 +124,8 @@ public class PageRankBenchmark extends EdgeListVertex<
      } else {
        job.setVertexClass(PageRankBenchmark.class);
      }
-    LOG.info("Using class " + 
BspUtils.getVertexClass(getConf()).getName());
+    LOG.info("Using class " +
+        BspUtils.getVertexClass(job.getConfiguration()).getName());
      job.setVertexInputFormatClass(PseudoRandomVertexInputFormat.class);
      job.setWorkerConfiguration(workers, workers, 100.0f);

> 2) How should I / do I enable the log4j? An appender that writes to the HDFS? How else
could I grep all my logs for errors and things?
log4j is used by the task trackers to dump to the job logs.  If you 
click on your running job in the web page, you can then click into each 
task and look at the logs under 'Task Logs'.  You can configure the task 
tracker log4jproperties to set the log level, but the default is info I 
believe.
> 3) With regard to Giraph and maven, none of the directions suggested doing "local overrides."
Therefore, why should I expect my Giraph installation to refer to libraries and configuration
in "~/Applications/hadoop or zookeeper" rather than those in "~.m2/repo?"
Giraph builts a massive jar that has all the required classes and jars 
to launch ZooKeeper and interact with Hadoop.  This makes for easy 
deployment to a running cluster.

> 4) Why doesn't running maven for Giraph install hadoop along the way (or does it)?
Because there are so many versions of Hadoop and if you are lauching 
Hadoop, then the hadoop jar should be in your classpath automatically.

> I'd appreciate if you'd help improve my understanding!
No problem.  Welcome to Giraph!

> Thanks!
> -Jeff
>
>
>


Mime
View raw message