giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bryan Rowe II <bryanrow...@yahoo.com>
Subject Errors only with large graphs
Date Thu, 03 Jul 2014 20:06:47 GMT
Hello,

Giraph: release-1.0.0-RC3

In short, when I use large graphs with the Shortest Paths example, it fails.  But when I
use the small graph provided on the Quick Start guide, it succeeds.
I converted all of my large graphs into the format shown in the Quick Start guide to simply
things.
I'm using a one-node setup.



Here is the command I'm using:
hadoop jar $GIRAPH_HOME/giraph-examples/target/giraph-examples-1.0.0-for-hadoop-0.20.203.0-jar-with-dependencies.jar


org.apache.giraph.GiraphRunner org.apache.giraph.examples.SimpleShortestPathsVertex 

-vif org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat
-vip /user/ubuntu/input/CA.txt -of org.apache.giraph.io.formats.IdWithValueTextOutputFormat
-op /user/ubuntu/output/shortestpaths 

-w 1

(all on one line)

CA.txt is a large graph file: 96,026,228 bytes

The job fails in 10mins, 46sec.

Two Map tasks are created when run.
The first one, task_201407021636_0006_m_000000, is KILLED.  syslog:

===========================================================================

2014-07-02 17:01:34,757 INFO org.apache.hadoop.util.NativeCodeLoader: Loaded the native-hadoop
library
2014-07-02 17:01:34,945 WARN org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Source name
ugi already exists!
2014-07-02 17:01:35,127 INFO org.apache.giraph.graph.GraphTaskManager: setup: Log level remains
at info
2014-07-02 17:01:35,159 INFO org.apache.giraph.graph.GraphTaskManager: Distributed cache is
empty. Assuming fatjar.
2014-07-02 17:01:35,159 INFO org.apache.giraph.graph.GraphTaskManager: setup: classpath @
/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/jars/job.jar
for job Giraph: org.apache.giraph.examples.SimpleShortestPathsVertex
2014-07-02 17:01:35,201 INFO org.apache.giraph.zk.ZooKeeperManager: createCandidateStamp:
Made the directory _bsp/_defaultZkManagerDir/job_201407021636_0006
2014-07-02 17:01:35,204 INFO org.apache.giraph.zk.ZooKeeperManager: createCandidateStamp:
Creating my filestamp _bsp/_defaultZkManagerDir/job_201407021636_0006/_task/ec2-54-186-5-159.us-west-2.compute.amazonaws.com
0
2014-07-02 17:01:35,219 INFO org.apache.giraph.zk.ZooKeeperManager: getZooKeeperServerList:
Got [ec2-54-186-5-159.us-west-2.compute.amazonaws.com] 1 hosts from 1 candidates when 1 required
(polling period is 3000) on attempt 0
2014-07-02 17:01:35,220 INFO org.apache.giraph.zk.ZooKeeperManager: createZooKeeperServerList:
Creating the final ZooKeeper file '_bsp/_defaultZkManagerDir/job_201407021636_0006/zkServerList_ec2-54-186-5-159.us-west-2.compute.amazonaws.com
0 '
2014-07-02 17:01:35,228 INFO org.apache.giraph.zk.ZooKeeperManager: getZooKeeperServerList:
For task 0, got file 'zkServerList_ec2-54-186-5-159.us-west-2.compute.amazonaws.com 0 ' (polling
period is 3000)
2014-07-02 17:01:35,228 INFO org.apache.giraph.zk.ZooKeeperManager: getZooKeeperServerList:
Found [ec2-54-186-5-159.us-west-2.compute.amazonaws.com, 0] 2 hosts in filename 'zkServerList_ec2-54-186-5-159.us-west-2.compute.amazonaws.com
0 '
2014-07-02 17:01:35,229 INFO org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers:
Trying to delete old directory /home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/work/_bspZooKeeper
2014-07-02 17:01:35,234 INFO org.apache.giraph.zk.ZooKeeperManager: generateZooKeeperConfigFile:
Creating file /home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/work/_bspZooKeeper/zoo.cfg
in /home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/work/_bspZooKeeper
with base port 22181
2014-07-02 17:01:35,234 INFO org.apache.giraph.zk.ZooKeeperManager: generateZooKeeperConfigFile:
Make directory of _bspZooKeeper = true
2014-07-02 17:01:35,235 INFO org.apache.giraph.zk.ZooKeeperManager: generateZooKeeperConfigFile:
Delete of zoo.cfg = false
2014-07-02 17:01:35,236 INFO org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers:
Attempting to start ZooKeeper server with command [/usr/lib/jvm/java-7-oracle/jre/bin/java,
-Xmx512m, -XX:ParallelGCThreads=4, -XX:+UseConcMarkSweepGC, -XX:CMSInitiatingOccupancyFraction=70,
-XX:MaxGCPauseMillis=100, -cp, /home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/jars/job.jar,
org.apache.zookeeper.server.quorum.QuorumPeerMain, /home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/work/_bspZooKeeper/zoo.cfg]
in directory /home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/work/_bspZooKeeper
2014-07-02 17:01:35,238 INFO org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers:
Shutdown hook added.
2014-07-02 17:01:35,238 INFO org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers:
Connect attempt 0 of 10 max trying to connect to ec2-54-186-5-159.us-west-2.compute.amazonaws.com:22181
with poll msecs = 3000
2014-07-02 17:01:35,241 WARN org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers:
Got ConnectException
java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect(Native
Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339) at
java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:579)
at org.apache.giraph.zk.ZooKeeperManager.onlineZooKeeperServers(ZooKeeperManager.java:701)
at org.apache.giraph.graph.GraphTaskManager.startZooKeeperManager(GraphTaskManager.java:357)
at org.apache.giraph.graph.GraphTaskManager.setup(GraphTaskManager.java:188) at org.apache.giraph.graph.GraphMapper.setup(GraphMapper.java:60)
at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:90) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
at
 org.apache.hadoop.mapred.MapTask.run(MapTask.java:369) at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059) at
org.apache.hadoop.mapred.Child.main(Child.java:253)
2014-07-02 17:01:38,242 INFO org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers:
Connect attempt 1 of 10 max trying to connect to ec2-54-186-5-159.us-west-2.compute.amazonaws.com:22181
with poll msecs = 3000
2014-07-02 17:01:38,243 INFO org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers:
Connected to ec2-54-186-5-159.us-west-2.compute.amazonaws.com/172.31.45.24:22181!
2014-07-02 17:01:38,243 INFO org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers:
Creating my filestamp _bsp/_defaultZkManagerDir/job_201407021636_0006/_zkServer/ec2-54-186-5-159.us-west-2.compute.amazonaws.com
0
2014-07-02 17:01:40,249 INFO org.apache.giraph.graph.GraphTaskManager: setup: Chosen to run
ZooKeeper...
2014-07-02 17:01:40,249 INFO org.apache.giraph.graph.GraphTaskManager: setup: Starting up
BspServiceMaster (master thread)...
2014-07-02 17:01:40,260 INFO org.apache.giraph.bsp.BspService: BspService: Connecting to ZooKeeper
with job job_201407021636_0006, 0 on ec2-54-186-5-159.us-west-2.compute.amazonaws.com:22181
2014-07-02 17:01:40,270 INFO org.apache.zookeeper.ZooKeeper: Client environment:zookeeper.version=3.3.3-1073969,
built on 02/23/2011 22:27 GMT
2014-07-02 17:01:40,270 INFO org.apache.zookeeper.ZooKeeper: Client environment:host.name=ec2-54-186-5-159.us-west-2.compute.amazonaws.com
2014-07-02 17:01:40,270 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.version=1.7.0_60
2014-07-02 17:01:40,270 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.vendor=Oracle
Corporation
2014-07-02 17:01:40,271 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.home=/usr/lib/jvm/java-7-oracle/jre
2014-07-02 17:01:40,271 INFO org.apache.zookeeper.ZooKeeper: Client
 environment:java.class.path=/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/jars/classes:/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/jars:/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/attempt_201407021636_0006_m_000000_0/work:/home/ubuntu/hadoop/bin/../conf:/usr/lib/jvm/java-7-oracle/lib/tools.jar:/home/ubuntu/hadoop/bin/..:/home/ubuntu/hadoop/bin/../hadoop-core-0.20.203.0.jar:/home/ubuntu/hadoop/bin/../lib/aspectjrt-1.6.5.jar:/home/ubuntu/hadoop/bin/../lib/aspectjtools-1.6.5.jar:/home/ubuntu/hadoop/bin/../lib/commons-beanutils-1.7.0.jar:/home/ubuntu/hadoop/bin/../lib/commons-beanutils-core-1.8.0.jar:/home/ubuntu/hadoop/bin/../lib/commons-cli-1.2.jar:/home/ubuntu/hadoop/bin/../lib/commons-codec-1.4.jar:/home/ubuntu/hadoop/bin/../lib/commons-collections-3.2.1.jar:/home/ubuntu/hadoop/bin/../lib/commons-configuration-1.6.jar:/home/ubuntu/hadoo
p/bin/../lib/commons-daemon-1.0.1.jar:/home/ubuntu/hadoop/bin/../lib/commons-digester-1.8.jar:/home/ubuntu/hadoop/bin/../lib/commons-el-1.0.jar:/home/ubuntu/hadoop/bin/../lib/commons-httpclient-3.0.1.jar:/home/ubuntu/hadoop/bin/../lib/commons-lang-2.4.jar:/home/ubuntu/hadoop/bin/../lib/commons-logging-1.1.1.jar:/home/ubuntu/hadoop/bin/../lib/commons-logging-api-1.0.4.jar:/home/ubuntu/hadoop/bin/../lib/commons-math-2.1.jar:/home/ubuntu/hadoop/bin/../lib/commons-net-1.4.1.jar:/home/ubuntu/hadoop/bin/../lib/core-3.1.1.jar:/home/ubuntu/hadoop/bin/../lib/hsqldb-1.8.0.10.jar:/home/ubuntu/hadoop/bin/../lib/jackson-core-asl-1.0.1.jar:/home/ubuntu/hadoop/bin/../lib/jackson-mapper-asl-1.0.1.jar:/home/ubuntu/hadoop/bin/../lib/jasper-compiler-5.5.12.jar:/home/ubuntu/hadoop/bin/../lib/jasper-runtime-5.5.12.jar:/home/ubuntu/hadoop/bin/../lib/jets3t-0.6.1.jar:/home/ubuntu/hadoop/bin/../lib/jetty-6.1.26.jar:/home/ubuntu/hadoop/bin/../lib/jetty-util-6.1.26.jar:/home/ubu
ntu/hadoop/bin/../lib/jsch-0.1.42.jar:/home/ubuntu/hadoop/bin/../lib/junit-4.5.jar:/home/ubuntu/hadoop/bin/../lib/kfs-0.2.2.jar:/home/ubuntu/hadoop/bin/../lib/log4j-1.2.15.jar:/home/ubuntu/hadoop/bin/../lib/mockito-all-1.8.5.jar:/home/ubuntu/hadoop/bin/../lib/oro-2.0.8.jar:/home/ubuntu/hadoop/bin/../lib/servlet-api-2.5-20081211.jar:/home/ubuntu/hadoop/bin/../lib/slf4j-api-1.4.3.jar:/home/ubuntu/hadoop/bin/../lib/slf4j-log4j12-1.4.3.jar:/home/ubuntu/hadoop/bin/../lib/xmlenc-0.52.jar:/home/ubuntu/hadoop/bin/../lib/jsp-2.1/jsp-2.1.jar:/home/ubuntu/hadoop/bin/../lib/jsp-2.1/jsp-api-2.1.jar
2014-07-02 17:01:40,271 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.library.path=/home/ubuntu/hadoop/bin/../lib/native/Linux-amd64-64:/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/attempt_201407021636_0006_m_000000_0/work
2014-07-02 17:01:40,271 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/attempt_201407021636_0006_m_000000_0/work/tmp
2014-07-02 17:01:40,271 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.compiler=<NA>
2014-07-02 17:01:40,271 INFO org.apache.zookeeper.ZooKeeper: Client environment:os.name=Linux
2014-07-02 17:01:40,271 INFO org.apache.zookeeper.ZooKeeper: Client environment:os.arch=amd64
2014-07-02 17:01:40,271 INFO org.apache.zookeeper.ZooKeeper: Client environment:os.version=3.2.0-58-virtual
2014-07-02 17:01:40,271 INFO org.apache.zookeeper.ZooKeeper: Client environment:user.name=ubuntu
2014-07-02 17:01:40,271 INFO org.apache.zookeeper.ZooKeeper: Client environment:user.home=/home/ubuntu
2014-07-02 17:01:40,271 INFO org.apache.zookeeper.ZooKeeper: Client environment:user.dir=/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/attempt_201407021636_0006_m_000000_0/work
2014-07-02 17:01:40,272 INFO org.apache.zookeeper.ZooKeeper: Initiating client connection,
connectString=ec2-54-186-5-159.us-west-2.compute.amazonaws.com:22181 sessionTimeout=60000
watcher=org.apache.giraph.master.BspServiceMaster@4deb9df0
2014-07-02 17:01:40,284 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to
server ec2-54-186-5-159.us-west-2.compute.amazonaws.com/172.31.45.24:22181
2014-07-02 17:01:40,296 INFO org.apache.zookeeper.ClientCnxn: Socket connection established
to ec2-54-186-5-159.us-west-2.compute.amazonaws.com/172.31.45.24:22181, initiating session
2014-07-02 17:01:42,336 INFO org.apache.zookeeper.ClientCnxn: Session establishment complete
on server ec2-54-186-5-159.us-west-2.compute.amazonaws.com/172.31.45.24:22181, sessionid =
0x146f806378d0000, negotiated timeout = 600000
2014-07-02 17:01:42,337 INFO org.apache.giraph.bsp.BspService: process: Asynchronous connection
complete.
2014-07-02 17:01:42,343 INFO org.apache.giraph.graph.GraphTaskManager: map: No need to do
anything when not a worker
2014-07-02 17:01:42,343 INFO org.apache.giraph.graph.GraphTaskManager: cleanup: Starting for
MASTER_ZOOKEEPER_ONLY
2014-07-02 17:01:45,033 INFO org.apache.giraph.bsp.BspService: getJobState: Job state already
exists (/_hadoopBsp/job_201407021636_0006/_masterJobState)
2014-07-02 17:01:45,038 INFO org.apache.giraph.master.BspServiceMaster: becomeMaster: First
child is '/_hadoopBsp/job_201407021636_0006/_masterElectionDir/ec2-54-186-5-159.us-west-2.compute.amazonaws.com_00000000000'
and my bid is '/_hadoopBsp/job_201407021636_0006/_masterElectionDir/ec2-54-186-5-159.us-west-2.compute.amazonaws.com_00000000000'
2014-07-02 17:01:45,044 INFO org.apache.giraph.bsp.BspService: getApplicationAttempt: Node
/_hadoopBsp/job_201407021636_0006/_applicationAttemptsDir already exists!
2014-07-02 17:01:45,049 INFO org.apache.giraph.bsp.BspService: process: applicationAttemptChanged
signaled
2014-07-02 17:01:45,089 INFO org.apache.giraph.comm.netty.NettyServer: NettyServer: Using
execution handler with 8 threads after requestFrameDecoder.
2014-07-02 17:01:45,111 INFO org.apache.giraph.comm.netty.NettyServer: start: Started server
communication server: ec2-54-186-5-159.us-west-2.compute.amazonaws.com/172.31.45.24:30000
with up to 16 threads on bind attempt 0 with sendBufferSize = 32768 receiveBufferSize = 524288
backlog = 1
2014-07-02 17:01:45,116 INFO org.apache.giraph.comm.netty.NettyClient: NettyClient: Using
execution handler with 8 threads after requestEncoder.
2014-07-02 17:01:45,119 INFO org.apache.giraph.master.BspServiceMaster: becomeMaster: I am
now the master!
2014-07-02 17:01:45,126 INFO org.apache.giraph.bsp.BspService: getApplicationAttempt: Node
/_hadoopBsp/job_201407021636_0006/_applicationAttemptsDir already exists!
2014-07-02 17:01:45,140 INFO org.apache.giraph.io.formats.GiraphFileInputFormat: Total input
paths to process : 1
2014-07-02 17:01:45,149 INFO org.apache.giraph.master.BspServiceMaster: generateVertexInputSplits:
Got 2 input splits for 1 input threads
2014-07-02 17:01:45,149 INFO org.apache.giraph.master.BspServiceMaster: createVertexInputSplits:
Starting to write input split data to zookeeper with 1 threads
2014-07-02 17:01:45,163 INFO org.apache.giraph.master.BspServiceMaster: createVertexInputSplits:
Done writing input split data to zookeeper
2014-07-02 17:01:45,173 INFO org.apache.giraph.comm.netty.NettyClient: Using Netty without
authentication.
2014-07-02 17:01:45,187 INFO org.apache.giraph.comm.netty.NettyClient: connectAllAddresses:
Successfully added 1 connections, (1 total connected) 0 failed, 0 failures total.
2014-07-02 17:01:45,188 INFO org.apache.giraph.partition.PartitionUtils: computePartitionCount:
Creating 1, default would have been 1 partitions.
2014-07-02 17:01:45,350 INFO org.apache.giraph.comm.netty.NettyServer: start: Using Netty
without authentication.
2014-07-02 17:06:45,347 INFO org.apache.giraph.master.BspServiceMaster: barrierOnWorkerList:
0 out of 1 workers finished on superstep -1 on path /_hadoopBsp/job_201407021636_0006/_vertexInputSplitDoneDir
2014-07-02 17:06:45,349 INFO org.apache.giraph.master.BspServiceMaster: barrierOnWorkerList:
Waiting on [ec2-54-186-5-159.us-west-2.compute.amazonaws.com_1]
2014-07-02 17:11:45,355 INFO org.apache.giraph.master.BspServiceMaster: barrierOnWorkerList:
0 out of 1 workers finished on superstep -1 on path /_hadoopBsp/job_201407021636_0006/_vertexInputSplitDoneDir
2014-07-02 17:11:45,355 INFO org.apache.giraph.master.BspServiceMaster: barrierOnWorkerList:
Waiting on [ec2-54-186-5-159.us-west-2.compute.amazonaws.com_1]
2014-07-02 17:12:07,201 INFO org.apache.giraph.zk.ZooKeeperManager: run: Shutdown hook started.
2014-07-02 17:12:07,202 WARN org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers:
Forced a shutdown hook kill of the ZooKeeper process.
2014-07-02 17:12:07,518 INFO org.apache.zookeeper.ClientCnxn: Unable to read additional data
from server sessionid 0x146f806378d0000, likely server has closed socket, closing socket connection
and attempting reconnect
2014-07-02 17:12:07,518 INFO org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers:
ZooKeeper process exited with 143 (note that 143 typically means killed).
===========================================================================








The second one, task_201407021636_0006_m_000001, goes to the FAILED state.  syslog:
===========================================================================
2014-07-02 17:01:38,016 INFO org.apache.hadoop.util.NativeCodeLoader: Loaded the native-hadoop
library
2014-07-02 17:01:38,203 WARN org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Source name
ugi already exists!
2014-07-02 17:01:38,379 INFO org.apache.giraph.graph.GraphTaskManager: setup: Log level remains
at info
2014-07-02 17:01:40,280 INFO org.apache.giraph.graph.GraphTaskManager: Distributed cache is
empty. Assuming fatjar.
2014-07-02 17:01:40,281 INFO org.apache.giraph.graph.GraphTaskManager: setup: classpath @
/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/jars/job.jar
for job Giraph: org.apache.giraph.examples.SimpleShortestPathsVertex
2014-07-02 17:01:40,317 INFO org.apache.giraph.zk.ZooKeeperManager: createCandidateStamp:
Made the directory _bsp/_defaultZkManagerDir/job_201407021636_0006
2014-07-02 17:01:40,320 INFO org.apache.giraph.zk.ZooKeeperManager: createCandidateStamp:
Creating my filestamp _bsp/_defaultZkManagerDir/job_201407021636_0006/_task/ec2-54-186-5-159.us-west-2.compute.amazonaws.com
1
2014-07-02 17:01:42,109 INFO org.apache.giraph.zk.ZooKeeperManager: getZooKeeperServerList:
For task 1, got file 'zkServerList_ec2-54-186-5-159.us-west-2.compute.amazonaws.com 0 ' (polling
period is 3000)
2014-07-02 17:01:42,313 INFO org.apache.giraph.zk.ZooKeeperManager: getZooKeeperServerList:
Found [ec2-54-186-5-159.us-west-2.compute.amazonaws.com, 0] 2 hosts in filename 'zkServerList_ec2-54-186-5-159.us-west-2.compute.amazonaws.com
0 '
2014-07-02 17:01:42,316 INFO org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers:
Got [ec2-54-186-5-159.us-west-2.compute.amazonaws.com] 1 hosts from 1 ready servers when 1
required (polling period is 3000) on attempt 0
2014-07-02 17:01:42,316 INFO org.apache.giraph.graph.GraphTaskManager: setup: Starting up
BspServiceWorker...
2014-07-02 17:01:42,330 INFO org.apache.giraph.bsp.BspService: BspService: Connecting to ZooKeeper
with job job_201407021636_0006, 1 on ec2-54-186-5-159.us-west-2.compute.amazonaws.com:22181
2014-07-02 17:01:42,337 INFO org.apache.zookeeper.ZooKeeper: Client environment:zookeeper.version=3.3.3-1073969,
built on 02/23/2011 22:27 GMT
2014-07-02 17:01:42,337 INFO org.apache.zookeeper.ZooKeeper: Client environment:host.name=ec2-54-186-5-159.us-west-2.compute.amazonaws.com
2014-07-02 17:01:42,337 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.version=1.7.0_60
2014-07-02 17:01:42,337 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.vendor=Oracle
Corporation
2014-07-02 17:01:42,337 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.home=/usr/lib/jvm/java-7-oracle/jre
2014-07-02 17:01:42,338 INFO org.apache.zookeeper.ZooKeeper: Client
 environment:java.class.path=/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/jars/classes:/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/jars:/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/attempt_201407021636_0006_m_000001_0/work:/home/ubuntu/hadoop/bin/../conf:/usr/lib/jvm/java-7-oracle/lib/tools.jar:/home/ubuntu/hadoop/bin/..:/home/ubuntu/hadoop/bin/../hadoop-core-0.20.203.0.jar:/home/ubuntu/hadoop/bin/../lib/aspectjrt-1.6.5.jar:/home/ubuntu/hadoop/bin/../lib/aspectjtools-1.6.5.jar:/home/ubuntu/hadoop/bin/../lib/commons-beanutils-1.7.0.jar:/home/ubuntu/hadoop/bin/../lib/commons-beanutils-core-1.8.0.jar:/home/ubuntu/hadoop/bin/../lib/commons-cli-1.2.jar:/home/ubuntu/hadoop/bin/../lib/commons-codec-1.4.jar:/home/ubuntu/hadoop/bin/../lib/commons-collections-3.2.1.jar:/home/ubuntu/hadoop/bin/../lib/commons-configuration-1.6.jar:/home/ubuntu/hadoo
p/bin/../lib/commons-daemon-1.0.1.jar:/home/ubuntu/hadoop/bin/../lib/commons-digester-1.8.jar:/home/ubuntu/hadoop/bin/../lib/commons-el-1.0.jar:/home/ubuntu/hadoop/bin/../lib/commons-httpclient-3.0.1.jar:/home/ubuntu/hadoop/bin/../lib/commons-lang-2.4.jar:/home/ubuntu/hadoop/bin/../lib/commons-logging-1.1.1.jar:/home/ubuntu/hadoop/bin/../lib/commons-logging-api-1.0.4.jar:/home/ubuntu/hadoop/bin/../lib/commons-math-2.1.jar:/home/ubuntu/hadoop/bin/../lib/commons-net-1.4.1.jar:/home/ubuntu/hadoop/bin/../lib/core-3.1.1.jar:/home/ubuntu/hadoop/bin/../lib/hsqldb-1.8.0.10.jar:/home/ubuntu/hadoop/bin/../lib/jackson-core-asl-1.0.1.jar:/home/ubuntu/hadoop/bin/../lib/jackson-mapper-asl-1.0.1.jar:/home/ubuntu/hadoop/bin/../lib/jasper-compiler-5.5.12.jar:/home/ubuntu/hadoop/bin/../lib/jasper-runtime-5.5.12.jar:/home/ubuntu/hadoop/bin/../lib/jets3t-0.6.1.jar:/home/ubuntu/hadoop/bin/../lib/jetty-6.1.26.jar:/home/ubuntu/hadoop/bin/../lib/jetty-util-6.1.26.jar:/home/ubu
ntu/hadoop/bin/../lib/jsch-0.1.42.jar:/home/ubuntu/hadoop/bin/../lib/junit-4.5.jar:/home/ubuntu/hadoop/bin/../lib/kfs-0.2.2.jar:/home/ubuntu/hadoop/bin/../lib/log4j-1.2.15.jar:/home/ubuntu/hadoop/bin/../lib/mockito-all-1.8.5.jar:/home/ubuntu/hadoop/bin/../lib/oro-2.0.8.jar:/home/ubuntu/hadoop/bin/../lib/servlet-api-2.5-20081211.jar:/home/ubuntu/hadoop/bin/../lib/slf4j-api-1.4.3.jar:/home/ubuntu/hadoop/bin/../lib/slf4j-log4j12-1.4.3.jar:/home/ubuntu/hadoop/bin/../lib/xmlenc-0.52.jar:/home/ubuntu/hadoop/bin/../lib/jsp-2.1/jsp-2.1.jar:/home/ubuntu/hadoop/bin/../lib/jsp-2.1/jsp-api-2.1.jar
2014-07-02 17:01:42,338 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.library.path=/home/ubuntu/hadoop/bin/../lib/native/Linux-amd64-64:/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/attempt_201407021636_0006_m_000001_0/work
2014-07-02 17:01:42,338 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/attempt_201407021636_0006_m_000001_0/work/tmp
2014-07-02 17:01:42,338 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.compiler=<NA>
2014-07-02 17:01:42,338 INFO org.apache.zookeeper.ZooKeeper: Client environment:os.name=Linux
2014-07-02 17:01:42,338 INFO org.apache.zookeeper.ZooKeeper: Client environment:os.arch=amd64
2014-07-02 17:01:42,338 INFO org.apache.zookeeper.ZooKeeper: Client environment:os.version=3.2.0-58-virtual
2014-07-02 17:01:42,338 INFO org.apache.zookeeper.ZooKeeper: Client environment:user.name=ubuntu
2014-07-02 17:01:42,338 INFO org.apache.zookeeper.ZooKeeper: Client environment:user.home=/home/ubuntu
2014-07-02 17:01:42,338 INFO org.apache.zookeeper.ZooKeeper: Client environment:user.dir=/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/attempt_201407021636_0006_m_000001_0/work
2014-07-02 17:01:42,339 INFO org.apache.zookeeper.ZooKeeper: Initiating client connection,
connectString=ec2-54-186-5-159.us-west-2.compute.amazonaws.com:22181 sessionTimeout=60000
watcher=org.apache.giraph.worker.BspServiceWorker@54a5733f
2014-07-02 17:01:42,347 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to
server ec2-54-186-5-159.us-west-2.compute.amazonaws.com/172.31.45.24:22181
2014-07-02 17:01:42,347 INFO org.apache.zookeeper.ClientCnxn: Socket connection established
to ec2-54-186-5-159.us-west-2.compute.amazonaws.com/172.31.45.24:22181, initiating session
2014-07-02 17:01:42,812 INFO org.apache.zookeeper.ClientCnxn: Session establishment complete
on server ec2-54-186-5-159.us-west-2.compute.amazonaws.com/172.31.45.24:22181, sessionid =
0x146f806378d0001, negotiated timeout = 600000
2014-07-02 17:01:42,814 INFO org.apache.giraph.bsp.BspService: process: Asynchronous connection
complete.
2014-07-02 17:01:42,819 INFO org.apache.giraph.comm.netty.NettyWorkerServer: createMessageStoreFactory:
Using ByteArrayMessagesPerVertexStore since there is no combiner
2014-07-02 17:01:42,881 INFO org.apache.giraph.comm.netty.NettyServer: NettyServer: Using
execution handler with 8 threads after requestFrameDecoder.
2014-07-02 17:01:42,904 INFO org.apache.giraph.comm.netty.NettyServer: start: Started server
communication server: ec2-54-186-5-159.us-west-2.compute.amazonaws.com/172.31.45.24:30001
with up to 16 threads on bind attempt 0 with sendBufferSize = 32768 receiveBufferSize = 524288
backlog = 1
2014-07-02 17:01:42,907 INFO org.apache.giraph.comm.netty.NettyClient: NettyClient: Using
execution handler with 8 threads after requestEncoder.
2014-07-02 17:01:42,914 INFO org.apache.giraph.graph.GraphTaskManager: setup: Registering
health of this worker...
2014-07-02 17:01:45,049 INFO org.apache.giraph.bsp.BspService: process: applicationAttemptChanged
signaled
2014-07-02 17:01:45,068 WARN org.apache.giraph.bsp.BspService: process: Unknown and unprocessed
event (path=/_hadoopBsp/job_201407021636_0006/_applicationAttemptsDir/0/_superstepDir, type=NodeChildrenChanged,
state=SyncConnected)
2014-07-02 17:01:45,076 INFO org.apache.giraph.worker.BspServiceWorker: registerHealth: Created
my health node for attempt=0, superstep=-1 with /_hadoopBsp/job_201407021636_0006/_applicationAttemptsDir/0/_superstepDir/-1/_workerHealthyDir/ec2-54-186-5-159.us-west-2.compute.amazonaws.com_1
and workerInfo= Worker(hostname=ec2-54-186-5-159.us-west-2.compute.amazonaws.com, MRtaskID=1,
port=30001)
2014-07-02 17:01:45,183 INFO org.apache.giraph.comm.netty.NettyServer: start: Using Netty
without authentication.
2014-07-02 17:01:45,339 INFO org.apache.giraph.bsp.BspService: process: partitionAssignmentsReadyChanged
(partitions are assigned)
2014-07-02 17:01:45,345 INFO org.apache.giraph.worker.BspServiceWorker: startSuperstep: Master(hostname=ec2-54-186-5-159.us-west-2.compute.amazonaws.com,
MRtaskID=0, port=30000)
2014-07-02 17:01:45,345 INFO org.apache.giraph.worker.BspServiceWorker: startSuperstep: Ready
for computation on superstep -1 since worker selection and vertex range assignments are done
in /_hadoopBsp/job_201407021636_0006/_applicationAttemptsDir/0/_superstepDir/-1/_addressesAndPartitions
2014-07-02 17:01:45,346 INFO org.apache.giraph.comm.netty.NettyClient: Using Netty without
authentication.
2014-07-02 17:01:45,354 INFO org.apache.giraph.comm.netty.NettyClient: connectAllAddresses:
Successfully added 1 connections, (1 total connected) 0 failed, 0 failures total.
2014-07-02 17:01:45,359 INFO org.apache.giraph.worker.BspServiceWorker: loadInputSplits: Using
1 thread(s), originally 1 threads(s) for 2 total splits.
2014-07-02 17:01:45,362 INFO org.apache.giraph.comm.SendPartitionCache: SendPartitionCache:
maxVerticesPerTransfer = 10000
2014-07-02 17:01:45,362 INFO org.apache.giraph.comm.SendPartitionCache: SendPartitionCache:
maxEdgesPerTransfer = 80000
2014-07-02 17:01:45,372 INFO org.apache.giraph.worker.InputSplitsHandler: reserveInputSplit:
Reserved input split path /_hadoopBsp/job_201407021636_0006/_vertexInputSplitDir/0, overall
roughly 0.0% input splits reserved
2014-07-02 17:01:45,373 INFO org.apache.giraph.worker.InputSplitsCallable: getInputSplit:
Reserved /_hadoopBsp/job_201407021636_0006/_vertexInputSplitDir/0 from ZooKeeper and got input
split 'hdfs://ec2-54-186-5-159.us-west-2.compute.amazonaws.com:54310/user/ubuntu/input/CA.txt:0+67108864'
2014-07-02 17:01:46,615 INFO org.apache.giraph.worker.VertexInputSplitsCallable: readVertexInputSplit:
Loaded 250000 vertices at 200412.01085983627 vertices/sec 697447 edges at 560085.1663971642
edges/sec Memory (free/total/max) = 112.94M / 182.50M / 182.50M
2014-07-02 17:01:47,440 INFO org.apache.giraph.worker.VertexInputSplitsCallable: readVertexInputSplit:
Loaded 500000 vertices at 241131.36490688394 vertices/sec 1419367 edges at 685221.3493060106
edges/sec Memory (free/total/max) = 45.07M / 187.50M / 187.50M
2014-07-02 17:01:51,322 INFO org.apache.giraph.worker.VertexInputSplitsCallable: readVertexInputSplit:
Loaded 750000 vertices at 125921.72750649283 vertices/sec 2149814 edges at 361077.323111284
edges/sec Memory (free/total/max) = 16.73M / 189.50M / 189.50M
2014-07-02 17:01:55,205 ERROR org.apache.giraph.worker.BspServiceWorker: unregisterHealth:
Got failure, unregistering health on /_hadoopBsp/job_201407021636_0006/_applicationAttemptsDir/0/_superstepDir/-1/_workerHealthyDir/ec2-54-186-5-159.us-west-2.compute.amazonaws.com_1
on superstep -1
2014-07-02 17:01:55,213 INFO org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs'
truncater with mapRetainSize=-1 and reduceRetainSize=-1
2014-07-02 17:01:55,716 INFO org.apache.hadoop.io.nativeio.NativeIO: Initialized cache for
UID to User mapping with a cache timeout of 14400 seconds.
2014-07-02 17:01:55,717 INFO org.apache.hadoop.io.nativeio.NativeIO: Got UserName ubuntu for
UID 1000 from the native implementation
2014-07-02 17:01:55,718 WARN org.apache.hadoop.mapred.Child: Error running child
java.lang.IllegalStateException: run: Caught an unrecoverable exception waitFor: ExecutionException
occurred while waiting for org.apache.giraph.utils.ProgressableUtils$FutureWaitable@36df8f5
at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:102) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369) at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059) at
org.apache.hadoop.mapred.Child.main(Child.java:253)
Caused by: java.lang.IllegalStateException: waitFor: ExecutionException occurred while waiting
for org.apache.giraph.utils.ProgressableUtils$FutureWaitable@36df8f5 at org.apache.giraph.utils.ProgressableUtils.waitFor(ProgressableUtils.java:151)
at org.apache.giraph.utils.ProgressableUtils.waitForever(ProgressableUtils.java:111) at org.apache.giraph.utils.ProgressableUtils.getFutureResult(ProgressableUtils.java:73)
at org.apache.giraph.utils.ProgressableUtils.getResultsWithNCallables(ProgressableUtils.java:192)
at org.apache.giraph.worker.BspServiceWorker.loadInputSplits(BspServiceWorker.java:276) at
org.apache.giraph.worker.BspServiceWorker.loadVertices(BspServiceWorker.java:323) at org.apache.giraph.worker.BspServiceWorker.setup(BspServiceWorker.java:506)
at org.apache.giraph.graph.GraphTaskManager.execute(GraphTaskManager.java:230) at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:92)
... 7 more
Caused by: java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: Java heap
space at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:202)
at org.apache.giraph.utils.ProgressableUtils$FutureWaitable.waitFor(ProgressableUtils.java:271)
at org.apache.giraph.utils.ProgressableUtils.waitFor(ProgressableUtils.java:143) ... 15 more
Caused by: java.lang.OutOfMemoryError: Java heap space at java.util.concurrent.ConcurrentHashMap$Segment.rehash(ConcurrentHashMap.java:501)
at java.util.concurrent.ConcurrentHashMap$Segment.put(ConcurrentHashMap.java:460) at java.util.concurrent.ConcurrentHashMap.put(ConcurrentHashMap.java:1130)
at org.apache.giraph.partition.SimplePartition.addPartition(SimplePartition.java:87) at org.apache.giraph.partition.SimplePartitionStore.addPartition(SimplePartitionStore.java:71)
at org.apache.giraph.comm.requests.SendVertexRequest.doRequest(SendVertexRequest.java:81)
at org.apache.giraph.comm.netty.NettyWorkerClientRequestProcessor.doRequest(NettyWorkerClientRequestProcessor.java:470)
at org.apache.giraph.comm.netty.NettyWorkerClientRequestProcessor.sendPartitionRequest(NettyWorkerClientRequestProcessor.java:203)
at org.apache.giraph.comm.netty.NettyWorkerClientRequestProcessor.sendVertexRequest(NettyWorkerClientRequestProcessor.java:267)
at
 org.apache.giraph.worker.VertexInputSplitsCallable.readInputSplit(VertexInputSplitsCallable.java:140)
at org.apache.giraph.worker.InputSplitsCallable.loadInputSplit(InputSplitsCallable.java:220)
at org.apache.giraph.worker.InputSplitsCallable.call(InputSplitsCallable.java:161) at org.apache.giraph.worker.InputSplitsCallable.call(InputSplitsCallable.java:58)
at org.apache.giraph.utils.LogStacktraceCallable.call(LogStacktraceCallable.java:51) at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
2014-07-02 17:01:55,722 INFO org.apache.hadoop.mapred.Task: Runnning cleanup for the task
===========================================================================




I've tried increasing Java Heap Space in hadoop/conf/mapred-site.xml by adding this:
  <property>
    <name>mapred.child.java.opts</name>
   
 <value>-Xmx1024m</value>
  </property>

But that just caused the entire job to fail from the start.

Before using this version of Giraph, I used 1.0.0 and 1.1.0-RC0 and those 
versions provide me with more and different errors to debug that relate 
to problems with Hadoop itself.  So the Giraph version I'm currently 
using seems to be the best for me because these errors seem more 
manageable.


What can I do to fix this error?  I thought Giraph was built for large scale graph processing
so I suppose this problem was encountered before by 
someone testing large graphs.  I searched through the mailing archives and couldn't find
anything though.

I can provide more information if you need it.  Thanks a lot.
Bryan Rowe

Mime
View raw message