Hello, 

I am a final year Bsc Computer Science Student who is using Apache Giraph for my final year project and dissertation and would appreciate very much if someone could help me with the following issue.   

I am using Apache Giraph 1.1.0 Snapshot with Hadoop 0.20.203.0 and am having trouble running the ConnectedComponents example. I use the following command:

 hadoop jar /home/ghufran/Downloads/Giraph2/giraph/giraph-examples/target/giraph-examples-1.1.0-SNAPSHOT-for-hadoop-1.2.1-jar-with-dependencies.jar org.apache.giraph.GiraphRunner org.apache.giraph.examples.ConnectedComponentsComputation -vif org.apache.giraph.io.formats.IntIntNullTextVertexInputFormat -vip /user/ghufran/in/my_graph.txt -vof org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op /user/ghufran/outCC -w 1


I believe it gets stuck in the InputSuperstep as the following is displayed in terminal when the command is running:

14/03/30 10:48:46 INFO mapred.JobClient:  map 100% reduce 0%
14/03/30 10:48:50 INFO job.JobProgressTracker: Data from 1 workers - Loading data: 0 vertices loaded, 0 vertex input splits loaded; 0 edges loaded, 0 edge input splits loaded; min free memory on worker 1 - 109.01MB, average 109.01MB
14/03/30 10:48:55 INFO job.JobProgressTracker: Data from 1 workers - Loading data: 0 vertices loaded, 0 vertex input splits loaded; 0 edges loaded, 0 edge input splits loaded; min free memory on worker 1 - 109.01MB, average 109.01MB
14/03/30 10:49:00 INFO job.JobProgressTracker: Data from 1 workers - Loading data: 0 vertices loaded, 0 vertex input splits loaded; 0 edges loaded, 0 edge input splits loaded; min free memory on worker 1 - 108.78MB, average 108.78MB  
....

which I traced back to the following if statement in the toString() method of core.org.apache.job.CombinedWorkerProgress:

if (isInputSuperstep()) {
      sb.append("Loading data: ");
      sb.append(verticesLoaded).append(" vertices loaded, ");
      sb.append(vertexInputSplitsLoaded).append(
          " vertex input splits loaded; ");
      sb.append(edgesLoaded).append(" edges loaded, ");
      sb.append(edgeInputSplitsLoaded).append(" edge input splits loaded");

sb.append("; min free memory on worker ").append(
        workerWithMinFreeMemory).append(" - ").append(
        DECIMAL_FORMAT.format(minFreeMemoryMB)).append("MB, average ").append(
        DECIMAL_FORMAT.format(freeMemoryMB)).append("MB");
  
So it seems to me that it's not loading in the InputFormat correctly. So I am assuming there's something wrong with my input format class or, probably more likely, something wrong with the graph I passed in?

I pass in a small graph that has the format vertex id, vertex value, neighbours separated by tabs, my graph is shown below: 

1 0 2
2 1 1 3
3 2 2
4 3 2   

The full output is shown below after I ran my command is shown below. If anyone could explain to me why I am not getting the expected output I would greatly appreciate it. 

Many thanks, 

Ghufran


FULL OUTPUT:


14/03/30 10:48:06 INFO utils.ConfigurationUtils: No edge input format specified. Ensure your InputFormat does not require one.
14/03/30 10:48:06 INFO utils.ConfigurationUtils: No edge output format specified. Ensure your OutputFormat does not require one.
14/03/30 10:48:06 INFO job.GiraphJob: run: Since checkpointing is disabled (default), do not allow any task retries (setting mapred.map.max.attempts = 0, old value = 4)
14/03/30 10:48:07 INFO job.GiraphJob: run: Tracking URL: http://ghufran:50030/jobdetails.jsp?jobid=job_201403301044_0001
14/03/30 10:48:45 INFO job.HaltApplicationUtils$DefaultHaltInstructionsWriter: writeHaltInstructions: To halt after next superstep execute: 'bin/halt-application --zkServer ghufran:22181 --zkNode /_hadoopBsp/job_201403301044_0001/_haltComputation'
14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client environment:zookeeper.version=3.4.5-1392090, built on 09/30/2012 17:52 GMT
14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client environment:host.name=ghufran
14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client environment:java.version=1.7.0_51
14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client environment:java.vendor=Oracle Corporation
14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client environment:java.home=/usr/lib/jvm/java-7-oracle/jre
14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client environment:java.class.path=/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../conf:/usr/lib/jvm/java-7-oracle/lib/tools.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/..:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../hadoop-core-0.20.203.0.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/aspectjrt-1.6.5.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/aspectjtools-1.6.5.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-beanutils-1.7.0.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-beanutils-core-1.8.0.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-cli-1.2.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-codec-1.4.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-collections-3.2.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-configuration-1.6.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-daemon-1.0.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-digester-1.8.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-el-1.0.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-httpclient-3.0.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-lang-2.4.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-logging-1.1.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-logging-api-1.0.4.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-math-2.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-net-1.4.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/core-3.1.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/hsqldb-1.8.0.10.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/jackson-core-asl-1.0.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/jackson-mapper-asl-1.0.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/jasper-compiler-5.5.12.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/jasper-runtime-5.5.12.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/jets3t-0.6.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/jetty-6.1.26.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/jetty-util-6.1.26.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/jsch-0.1.42.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/junit-4.5.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/kfs-0.2.2.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/log4j-1.2.15.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/mockito-all-1.8.5.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/oro-2.0.8.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/servlet-api-2.5-20081211.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/slf4j-api-1.4.3.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/slf4j-log4j12-1.4.3.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/xmlenc-0.52.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/jsp-2.1/jsp-2.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/jsp-2.1/jsp-api-2.1.jar
14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client environment:java.library.path=/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/native/Linux-amd64-64
14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/tmp
14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client environment:java.compiler=<NA>
14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client environment:os.name=Linux
14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client environment:os.arch=amd64
14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client environment:os.version=3.8.0-35-generic
14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client environment:user.name=ghufran
14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client environment:user.home=/home/ghufran
14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client environment:user.dir=/home/ghufran/Downloads/hadoop-0.20.203.0/bin
14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=ghufran:22181 sessionTimeout=60000 watcher=org.apache.giraph.job.JobProgressTracker@209fa588
14/03/30 10:48:45 INFO mapred.JobClient: Running job: job_201403301044_0001
14/03/30 10:48:45 INFO zookeeper.ClientCnxn: Opening socket connection to server ghufran/127.0.1.1:22181. Will not attempt to authenticate using SASL (unknown error)
14/03/30 10:48:45 INFO zookeeper.ClientCnxn: Socket connection established to ghufran/127.0.1.1:22181, initiating session
14/03/30 10:48:45 INFO zookeeper.ClientCnxn: Session establishment complete on server ghufran/127.0.1.1:22181, sessionid = 0x1451263c44c0002, negotiated timeout = 600000
14/03/30 10:48:45 INFO job.JobProgressTracker: Data from 1 workers - Loading data: 0 vertices loaded, 0 vertex input splits loaded; 0 edges loaded, 0 edge input splits loaded; min free memory on worker 1 - 109.01MB, average 109.01MB
14/03/30 10:48:46 INFO mapred.JobClient:  map 100% reduce 0%
14/03/30 10:48:50 INFO job.JobProgressTracker: Data from 1 workers - Loading data: 0 vertices loaded, 0 vertex input splits loaded; 0 edges loaded, 0 edge input splits loaded; min free memory on worker 1 - 109.01MB, average 109.01MB
14/03/30 10:48:55 INFO job.JobProgressTracker: Data from 1 workers - Loading data: 0 vertices loaded, 0 vertex input splits loaded; 0 edges loaded, 0 edge input splits loaded; min free memory on worker 1 - 109.01MB, average 109.01MB
14/03/30 10:49:00 INFO job.JobProgressTracker: Data from 1 workers - Loading data: 0 vertices loaded, 0 vertex input splits loaded; 0 edges loaded, 0 edge input splits loaded; min free memory on worker 1 - 108.78MB, average 108.78MB