giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Young Han <young....@uwaterloo.ca>
Subject Re: Errors with large graphs (continued, new email)
Date Thu, 03 Jul 2014 21:59:43 GMT
Ahh yes, that's another common error. Try adding this to your mapred-site:

<property>
  <name>mapreduce.job.counters.max</name>
  <value>1000000</value>
</property>
<property>
  <name>mapreduce.job.counters.limit</name>
  <value>1000000</value>
</property>

Young


On Thu, Jul 3, 2014 at 5:49 PM, Bryan Rowe <bryanrowe09@gmail.com> wrote:

> I just changed my mapred-site.xml file to increase the heap size:
> <configuration>
> <property>
> <name>mapred.job.tracker</name>
> <value>hdfs://ec2-54-186-5-159.us-west-2.compute.amazonaws.com:54311
> </value>
> </property>
>
> <property>
> <name>mapred.tasktracker.map.tasks.maximum</name>
> <value>4</value>
> </property>
>
> <property>
> <name>mapred.map.tasks</name>
> <value>4</value>
> </property>
>
>
>
>
>
> *<property><name>mapred.child.java.opts</name><value>-Xmx1024m</value></property>*
> </configuration>
>
>
> I ran a Hadoop job and it seemed to be working for the first 2 minutes
> because it was computing supersteps.  However after 2 minutes it failed.
> Part of the reason is because the Haddop counters exceeded 120.  There were
> other errors though and I'm not sure if those are related to the counters
> problem.
>
> Here is an image album of the errors:  http://imgur.com/a/mAqn3
> Also, here is a link to the Hadoop job that failed:
> http://ec2-54-186-5-159.us-west-2.compute.amazonaws.com:50030/jobdetails.jsp?jobid=job_201407032123_0002&refresh=30
>
> I'll keep my instance up and running so the link should work.
> What would you recommend next?
>
> Thanks,
> Bryan
>
>
> On Thu, Jul 3, 2014 at 2:19 PM, Young Han <young.han@uwaterloo.ca> wrote:
>
>> What sort of error does it give? Did you try restarting Hadoop after
>> setting it? (stop-all.sh, start-all.sh, hadoop dfs -safemode wait).
>>
>> Young
>>
>>
>> On Thu, Jul 3, 2014 at 5:08 PM, Bryan Rowe <bryanrowe09@gmail.com> wrote:
>>
>>> Yes, before conversion they were also sorted by vertex ID.
>>> 0       1       1
>>> 0       4       1
>>> 0       5       1
>>> 1       0       1
>>> 1       2       1
>>> 1       3       1
>>> 2       1       1
>>> 2       6       1
>>> 2       7       1
>>> 3       8       1
>>>
>>> That was the format before conversion with format [source destination
>>> weight] of edges.
>>> I'm using a medium-tier EC2 instance c3.2xlarge.  It has 8 vCPUs and 15
>>> Gi of memory according to Amazon's website.
>>>
>>> Any suggestions on tweaking Xmx?  I tried doing that without any luck,
>>> it just didn't run the map phase then failed.
>>>
>>>
>>> On Thu, Jul 3, 2014 at 1:57 PM, Young Han <young.han@uwaterloo.ca>
>>> wrote:
>>>
>>>> From the other thread...
>>>>
>>>> Yeah, your input format looks correct. Did you have the graph sorted by
>>>> source vertex IDs before conversion? (I'm not sure if duplicate entries
>>>> with the same source ID matters, but just in case.)
>>>>
>>>> They're all out of memory errors, so I think Xmx is the culprit. What
>>>> type of EC2 instances are you using? You probably want to use something
>>>> larger than t1.micro or m1.small.
>>>>
>>>> Young
>>>>
>>>>
>>>> On Thu, Jul 3, 2014 at 4:53 PM, Bryan Rowe <bryanrowe09@gmail.com>
>>>> wrote:
>>>>
>>>>> First of all, I started this email thread with my old email from
>>>>> Yahoo, which was a mistake because it kept sending out duplicates.  Sorry
>>>>> for the inconvenience, but I'll continue it using this email thread from
>>>>> now on.
>>>>>
>>>>> *I originally posted this:*
>>>>> Hello,
>>>>>
>>>>> Giraph: release-1.0.0-RC3
>>>>>
>>>>> In short, when I use large graphs with the Shortest Paths example, it
>>>>> fails.  But when I use the small graph provided on the Quick Start guide,
>>>>> it succeeds.
>>>>> I converted all of my large graphs into the format shown in the Quick
>>>>> Start guide to simply things.
>>>>> I'm using a one-node setup.
>>>>>
>>>>>
>>>>> Here is the command I'm using:
>>>>> hadoop jar
>>>>> $GIRAPH_HOME/giraph-examples/target/giraph-examples-1.0.0-for-hadoop-0.20.203.0-jar-with-dependencies.jar
>>>>>
>>>>> org.apache.giraph.GiraphRunner
>>>>> org.apache.giraph.examples.SimpleShortestPathsVertex
>>>>> -vif
>>>>> org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat
>>>>>  -vip /user/ubuntu/input/CA.txt -of
>>>>> org.apache.giraph.io.formats.IdWithValueTextOutputFormat
>>>>>  -op /user/ubuntu/output/shortestpaths
>>>>> -w 1
>>>>>
>>>>> (all on one line)
>>>>>
>>>>> CA.txt is a large graph file: 96,026,228 bytes
>>>>>
>>>>> The job fails in 10mins, 46sec.
>>>>>
>>>>> Two Map tasks are created when run.
>>>>> The first one, task_201407021636_0006_m_000000, is KILLED.  syslog:
>>>>>
>>>>> ===========================================================================
>>>>>
>>>>> 2014-07-02 17:01:34,757 INFO org.apache.hadoop.util.NativeCodeLoader:
Loaded the native-hadoop library
>>>>> 2014-07-02 17:01:34,945 WARN org.apache.hadoop.metrics2.impl.MetricsSystemImpl:
Source name ugi already exists!
>>>>> 2014-07-02 17:01:35,127 INFO org.apache.giraph.graph.GraphTaskManager:
setup: Log level remains at info
>>>>> 2014-07-02 17:01:35,159 INFO org.apache.giraph.graph.GraphTaskManager:
Distributed cache is empty. Assuming fatjar.
>>>>> 2014-07-02 17:01:35,159 INFO org.apache.giraph.graph.GraphTaskManager:
setup: classpath @ /home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/jars/job.jar
for job Giraph: org.apache.giraph.examples.SimpleShortestPathsVertex
>>>>> 2014-07-02 17:01:35,201 INFO org.apache.giraph.zk.ZooKeeperManager: createCandidateStamp:
Made the directory _bsp/_defaultZkManagerDir/job_201407021636_0006
>>>>> 2014-07-02 17:01:35,204 INFO org.apache.giraph.zk.ZooKeeperManager: createCandidateStamp:
Creating my filestamp _bsp/_defaultZkManagerDir/job_201407021636_0006/_task/ec2-54-186-5-159.us-west-2.compute.amazonaws.com
0
>>>>> 2014-07-02 17:01:35,219 INFO org.apache.giraph.zk.ZooKeeperManager: getZooKeeperServerList:
Got [ec2-54-186-5-159.us-west-2.compute.amazonaws.com] 1 hosts from 1 candidates when 1 required
(polling period is 3000) on attempt 0
>>>>> 2014-07-02 17:01:35,220 INFO org.apache.giraph.zk.ZooKeeperManager: createZooKeeperServerList:
Creating the final ZooKeeper file '_bsp/_defaultZkManagerDir/job_201407021636_0006/zkServerList_ec2-54-186-5-159.us-west-2.compute.amazonaws.com
0 '
>>>>> 2014-07-02 17:01:35,228 INFO org.apache.giraph.zk.ZooKeeperManager: getZooKeeperServerList:
For task 0, got file 'zkServerList_ec2-54-186-5-159.us-west-2.compute.amazonaws.com 0 ' (polling
period is 3000)
>>>>> 2014-07-02 17:01:35,228 INFO org.apache.giraph.zk.ZooKeeperManager: getZooKeeperServerList:
Found [ec2-54-186-5-159.us-west-2.compute.amazonaws.com, 0] 2 hosts in filename 'zkServerList_ec2-54-186-5-159.us-west-2.compute.amazonaws.com
0 '
>>>>> 2014-07-02 17:01:35,229 INFO org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers:
Trying to delete old directory /home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/work/_bspZooKeeper
>>>>> 2014-07-02 17:01:35,234 INFO org.apache.giraph.zk.ZooKeeperManager: generateZooKeeperConfigFile:
Creating file /home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/work/_bspZooKeeper/zoo.cfg
in /home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/work/_bspZooKeeper
with base port 22181
>>>>> 2014-07-02 17:01:35,234 INFO org.apache.giraph.zk.ZooKeeperManager: generateZooKeeperConfigFile:
Make directory of _bspZooKeeper = true
>>>>> 2014-07-02 17:01:35,235 INFO org.apache.giraph.zk.ZooKeeperManager: generateZooKeeperConfigFile:
Delete of zoo.cfg = false
>>>>> 2014-07-02 17:01:35,236 INFO org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers:
Attempting to start ZooKeeper server with command [/usr/lib/jvm/java-7-oracle/jre/bin/java,
-Xmx512m, -XX:ParallelGCThreads=4, -XX:+UseConcMarkSweepGC, -XX:CMSInitiatingOccupancyFraction=70,
-XX:MaxGCPauseMillis=100, -cp, /home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/jars/job.jar,
org.apache.zookeeper.server.quorum.QuorumPeerMain, /home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/work/_bspZooKeeper/zoo.cfg]
in directory /home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/work/_bspZooKeeper
>>>>> 2014-07-02 17:01:35,238 INFO org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers:
Shutdown hook added.
>>>>> 2014-07-02 17:01:35,238 INFO org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers:
Connect attempt 0 of 10 max trying to connect to ec2-54-186-5-159.us-west-2.compute.amazonaws.com:22181
with poll msecs = 3000
>>>>> 2014-07-02 17:01:35,241 WARN org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers:
Got ConnectException
>>>>> java.net.ConnectException: Connection refused
>>>>> 	at java.net.PlainSocketImpl.socketConnect(Native Method)
>>>>> 	at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
>>>>> 	at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
>>>>> 	at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
>>>>> 	at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
>>>>> 	at java.net.Socket.connect(Socket.java:579)
>>>>> 	at org.apache.giraph.zk.ZooKeeperManager.onlineZooKeeperServers(ZooKeeperManager.java:701)
>>>>> 	at org.apache.giraph.graph.GraphTaskManager.startZooKeeperManager(GraphTaskManager.java:357)
>>>>> 	at org.apache.giraph.graph.GraphTaskManager.setup(GraphTaskManager.java:188)
>>>>> 	at org.apache.giraph.graph.GraphMapper.setup(GraphMapper.java:60)
>>>>> 	at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:90)
>>>>> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
>>>>> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
>>>>> 	at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
>>>>> 	at java.security.AccessController.doPrivileged(Native Method)
>>>>> 	at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
>>>>> 	at org.apache.hadoop.mapred.Child.main(Child.java:253)
>>>>> 2014-07-02 17:01:38,242 INFO org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers:
Connect attempt 1 of 10 max trying to connect to ec2-54-186-5-159.us-west-2.compute.amazonaws.com:22181
with poll msecs = 3000
>>>>> 2014-07-02 17:01:38,243 INFO org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers:
Connected to ec2-54-186-5-159.us-west-2.compute.amazonaws.com/172.31.45.24:22181!
>>>>> 2014-07-02 17:01:38,243 INFO org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers:
Creating my filestamp _bsp/_defaultZkManagerDir/job_201407021636_0006/_zkServer/ec2-54-186-5-159.us-west-2.compute.amazonaws.com
0
>>>>> 2014-07-02 17:01:40,249 INFO org.apache.giraph.graph.GraphTaskManager:
setup: Chosen to run ZooKeeper...
>>>>> 2014-07-02 17:01:40,249 INFO org.apache.giraph.graph.GraphTaskManager:
setup: Starting up BspServiceMaster (master thread)...
>>>>> 2014-07-02 17:01:40,260 INFO org.apache.giraph.bsp.BspService: BspService:
Connecting to ZooKeeper with job job_201407021636_0006, 0 on ec2-54-186-5-159.us-west-2.compute.amazonaws.com:22181
>>>>> 2014-07-02 17:01:40,270 INFO org.apache.zookeeper.ZooKeeper: Client environment:zookeeper.version=3.3.3-1073969,
built on 02/23/2011 22:27 GMT
>>>>> 2014-07-02 17:01:40,270 INFO org.apache.zookeeper.ZooKeeper: Client environment:host.name=ec2-54-186-5-159.us-west-2.compute.amazonaws.com
>>>>> 2014-07-02 17:01:40,270 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.version=1.7.0_60
>>>>> 2014-07-02 17:01:40,270 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.vendor=Oracle
Corporation
>>>>> 2014-07-02 17:01:40,271 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.home=/usr/lib/jvm/java-7-oracle/jre
>>>>> 2014-07-02 17:01:40,271 INFO org.apache.zookeeper.ZooKeeper: Client
>>>>>  environment:java.class.path=/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/jars/classes:/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/jars:/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/attempt_201407021636_0006_m_000000_0/work:/home/ubuntu/hadoop/bin/../conf:/usr/lib/jvm/java-7-oracle/lib/tools.jar:/home/ubuntu/hadoop/bin/..:/home/ubuntu/hadoop/bin/../hadoop-core-0.20.203.0.jar:/home/ubuntu/hadoop/bin/../lib/aspectjrt-1.6.5.jar:/home/ubuntu/hadoop/bin/../lib/aspectjtools-1.6.5.jar:/home/ubuntu/hadoop/bin/../lib/commons-beanutils-1.7.0.jar:/home/ubuntu/hadoop/bin/../lib/commons-beanutils-core-1.8.0.jar:/home/ubuntu/hadoop/bin/../lib/commons-cli-1.2.jar:/home/ubuntu/hadoop/bin/../lib/commons-codec-1.4.jar:/home/ubuntu/hadoop/bin/../lib/commons-collections-3.2.1.jar:/home/ubuntu/hadoop/bin/../lib/commons-configuration-1.6.jar:/home/ubuntu/hadoo
>>>>> p/bin/../lib/commons-daemon-1.0.1.jar:/home/ubuntu/hadoop/bin/../lib/commons-digester-1.8.jar:/home/ubuntu/hadoop/bin/../lib/commons-el-1.0.jar:/home/ubuntu/hadoop/bin/../lib/commons-httpclient-3.0.1.jar:/home/ubuntu/hadoop/bin/../lib/commons-lang-2.4.jar:/home/ubuntu/hadoop/bin/../lib/commons-logging-1.1.1.jar:/home/ubuntu/hadoop/bin/../lib/commons-logging-api-1.0.4.jar:/home/ubuntu/hadoop/bin/../lib/commons-math-2.1.jar:/home/ubuntu/hadoop/bin/../lib/commons-net-1.4.1.jar:/home/ubuntu/hadoop/bin/../lib/core-3.1.1.jar:/home/ubuntu/hadoop/bin/../lib/hsqldb-1.8.0.10.jar:/home/ubuntu/hadoop/bin/../lib/jackson-core-asl-1.0.1.jar:/home/ubuntu/hadoop/bin/../lib/jackson-mapper-asl-1.0.1.jar:/home/ubuntu/hadoop/bin/../lib/jasper-compiler-5.5.12.jar:/home/ubuntu/hadoop/bin/../lib/jasper-runtime-5.5.12.jar:/home/ubuntu/hadoop/bin/../lib/jets3t-0.6.1.jar:/home/ubuntu/hadoop/bin/../lib/jetty-6.1.26.jar:/home/ubuntu/hadoop/bin/../lib/jetty-util-6.1.26.jar:/home/ubu
>>>>> ntu/hadoop/bin/../lib/jsch-0.1.42.jar:/home/ubuntu/hadoop/bin/../lib/junit-4.5.jar:/home/ubuntu/hadoop/bin/../lib/kfs-0.2.2.jar:/home/ubuntu/hadoop/bin/../lib/log4j-1.2.15.jar:/home/ubuntu/hadoop/bin/../lib/mockito-all-1.8.5.jar:/home/ubuntu/hadoop/bin/../lib/oro-2.0.8.jar:/home/ubuntu/hadoop/bin/../lib/servlet-api-2.5-20081211.jar:/home/ubuntu/hadoop/bin/../lib/slf4j-api-1.4.3.jar:/home/ubuntu/hadoop/bin/../lib/slf4j-log4j12-1.4.3.jar:/home/ubuntu/hadoop/bin/../lib/xmlenc-0.52.jar:/home/ubuntu/hadoop/bin/../lib/jsp-2.1/jsp-2.1.jar:/home/ubuntu/hadoop/bin/../lib/jsp-2.1/jsp-api-2.1.jar
>>>>> 2014-07-02 17:01:40,271 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.library.path=/home/ubuntu/hadoop/bin/../lib/native/Linux-amd64-64:/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/attempt_201407021636_0006_m_000000_0/work
>>>>> 2014-07-02 17:01:40,271 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/attempt_201407021636_0006_m_000000_0/work/tmp
>>>>> 2014-07-02 17:01:40,271 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.compiler=<NA>
>>>>> 2014-07-02 17:01:40,271 INFO org.apache.zookeeper.ZooKeeper: Client environment:os.name=Linux
>>>>> 2014-07-02 17:01:40,271 INFO org.apache.zookeeper.ZooKeeper: Client environment:os.arch=amd64
>>>>> 2014-07-02 17:01:40,271 INFO org.apache.zookeeper.ZooKeeper: Client environment:os.version=3.2.0-58-virtual
>>>>> 2014-07-02 17:01:40,271 INFO org.apache.zookeeper.ZooKeeper: Client environment:user.name=ubuntu
>>>>> 2014-07-02 17:01:40,271 INFO org.apache.zookeeper.ZooKeeper: Client environment:user.home=/home/ubuntu
>>>>> 2014-07-02 17:01:40,271 INFO org.apache.zookeeper.ZooKeeper: Client environment:user.dir=/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/attempt_201407021636_0006_m_000000_0/work
>>>>> 2014-07-02 17:01:40,272 INFO org.apache.zookeeper.ZooKeeper: Initiating
client connection, connectString=ec2-54-186-5-159.us-west-2.compute.amazonaws.com:22181 sessionTimeout=60000
watcher=org.apache.giraph.master.BspServiceMaster@4deb9df0
>>>>> 2014-07-02 17:01:40,284 INFO org.apache.zookeeper.ClientCnxn: Opening
socket connection to server ec2-54-186-5-159.us-west-2.compute.amazonaws.com/172.31.45.24:22181
>>>>> 2014-07-02 <http://ec2-54-186-5-159.us-west-2.compute.amazonaws.com/172.31.45.24:221812014-07-02>
17:01:40,296 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to ec2-54-186-5-159.us-west-2.compute.amazonaws.com/172.31.45.24:22181,
initiating session
>>>>> 2014-07-02 17:01:42,336 INFO org.apache.zookeeper.ClientCnxn: Session
establishment complete on server ec2-54-186-5-159.us-west-2.compute.amazonaws.com/172.31.45.24:22181,
sessionid = 0x146f806378d0000, negotiated timeout = 600000
>>>>> 2014-07-02 17:01:42,337 INFO org.apache.giraph.bsp.BspService: process:
Asynchronous connection complete.
>>>>> 2014-07-02 17:01:42,343 INFO org.apache.giraph.graph.GraphTaskManager:
map: No need to do anything when not a worker
>>>>> 2014-07-02 17:01:42,343 INFO org.apache.giraph.graph.GraphTaskManager:
cleanup: Starting for MASTER_ZOOKEEPER_ONLY
>>>>> 2014-07-02 17:01:45,033 INFO org.apache.giraph.bsp.BspService: getJobState:
Job state already exists (/_hadoopBsp/job_201407021636_0006/_masterJobState)
>>>>> 2014-07-02 17:01:45,038 INFO org.apache.giraph.master.BspServiceMaster:
becomeMaster: First child is '/_hadoopBsp/job_201407021636_0006/_masterElectionDir/ec2-54-186-5-159.us-west-2.compute.amazonaws.com_00000000000'
and my bid is '/_hadoopBsp/job_201407021636_0006/_masterElectionDir/ec2-54-186-5-159.us-west-2.compute.amazonaws.com_00000000000'
>>>>> 2014-07-02 17:01:45,044 INFO org.apache.giraph.bsp.BspService: getApplicationAttempt:
Node /_hadoopBsp/job_201407021636_0006/_applicationAttemptsDir already exists!
>>>>> 2014-07-02 17:01:45,049 INFO org.apache.giraph.bsp.BspService: process:
applicationAttemptChanged signaled
>>>>> 2014-07-02 17:01:45,089 INFO org.apache.giraph.comm.netty.NettyServer:
NettyServer: Using execution handler with 8 threads after requestFrameDecoder.
>>>>> 2014-07-02 17:01:45,111 INFO org.apache.giraph.comm.netty.NettyServer:
start: Started server communication server: ec2-54-186-5-159.us-west-2.compute.amazonaws.com/172.31.45.24:30000
with up to 16 threads on bind attempt 0 with sendBufferSize = 32768 receiveBufferSize = 524288
backlog = 1
>>>>> 2014-07-02 17:01:45,116 INFO org.apache.giraph.comm.netty.NettyClient:
NettyClient: Using execution handler with 8 threads after requestEncoder.
>>>>> 2014-07-02 17:01:45,119 INFO org.apache.giraph.master.BspServiceMaster:
becomeMaster: I am now the master!
>>>>> 2014-07-02 17:01:45,126 INFO org.apache.giraph.bsp.BspService: getApplicationAttempt:
Node /_hadoopBsp/job_201407021636_0006/_applicationAttemptsDir already exists!
>>>>> 2014-07-02 17:01:45,140 INFO org.apache.giraph.io.formats.GiraphFileInputFormat:
Total input paths to process : 1
>>>>> 2014-07-02 17:01:45,149 INFO org.apache.giraph.master.BspServiceMaster:
generateVertexInputSplits: Got 2 input splits for 1 input threads
>>>>> 2014-07-02 17:01:45,149 INFO org.apache.giraph.master.BspServiceMaster:
createVertexInputSplits: Starting to write input split data to zookeeper with 1 threads
>>>>> 2014-07-02 17:01:45,163 INFO org.apache.giraph.master.BspServiceMaster:
createVertexInputSplits: Done writing input split data to zookeeper
>>>>> 2014-07-02 17:01:45,173 INFO org.apache.giraph.comm.netty.NettyClient:
Using Netty without authentication.
>>>>> 2014-07-02 17:01:45,187 INFO org.apache.giraph.comm.netty.NettyClient:
connectAllAddresses: Successfully added 1 connections, (1 total connected) 0 failed, 0 failures
total.
>>>>> 2014-07-02 17:01:45,188 INFO org.apache.giraph.partition.PartitionUtils:
computePartitionCount: Creating 1, default would have been 1 partitions.
>>>>> 2014-07-02 17:01:45,350 INFO org.apache.giraph.comm.netty.NettyServer:
start: Using Netty without authentication.
>>>>> 2014-07-02 17:06:45,347 INFO org.apache.giraph.master.BspServiceMaster:
barrierOnWorkerList: 0 out of 1 workers finished on superstep -1 on path /_hadoopBsp/job_201407021636_0006/_vertexInputSplitDoneDir
>>>>> 2014-07-02 17:06:45,349 INFO org.apache.giraph.master.BspServiceMaster:
barrierOnWorkerList: Waiting on [ec2-54-186-5-159.us-west-2.compute.amazonaws.com_1]
>>>>> 2014-07-02 17:11:45,355 INFO org.apache.giraph.master.BspServiceMaster:
barrierOnWorkerList: 0 out of 1 workers finished on superstep -1 on path /_hadoopBsp/job_201407021636_0006/_vertexInputSplitDoneDir
>>>>> 2014-07-02 17:11:45,355 INFO org.apache.giraph.master.BspServiceMaster:
barrierOnWorkerList: Waiting on [ec2-54-186-5-159.us-west-2.compute.amazonaws.com_1]
>>>>> 2014-07-02 17:12:07,201 INFO org.apache.giraph.zk.ZooKeeperManager: run:
Shutdown hook started.
>>>>> 2014-07-02 17:12:07,202 WARN org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers:
Forced a shutdown hook kill of the ZooKeeper process.
>>>>> 2014-07-02 17:12:07,518 INFO org.apache.zookeeper.ClientCnxn: Unable
to read additional data from server sessionid 0x146f806378d0000, likely server has closed
socket, closing socket connection and attempting reconnect
>>>>> 2014-07-02 17:12:07,518 INFO org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers:
ZooKeeper process exited with 143 (note that 143 typically means killed).
>>>>> ===========================================================================
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> The second one, task_201407021636_0006_m_000001, goes to the FAILED
>>>>> state.  syslog:
>>>>>
>>>>> ===========================================================================
>>>>>
>>>>> 2014-07-02 17:01:38,016 INFO org.apache.hadoop.util.NativeCodeLoader:
Loaded the native-hadoop library
>>>>> 2014-07-02 17:01:38,203 WARN org.apache.hadoop.metrics2.impl.MetricsSystemImpl:
Source name ugi already exists!
>>>>> 2014-07-02 17:01:38,379 INFO org.apache.giraph.graph.GraphTaskManager:
setup: Log level remains at info
>>>>> 2014-07-02 17:01:40,280 INFO org.apache.giraph.graph.GraphTaskManager:
Distributed cache is empty. Assuming fatjar.
>>>>> 2014-07-02 17:01:40,281 INFO org.apache.giraph.graph.GraphTaskManager:
setup: classpath @ /home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/jars/job.jar
for job Giraph: org.apache.giraph.examples.SimpleShortestPathsVertex
>>>>> 2014-07-02 17:01:40,317 INFO org.apache.giraph.zk.ZooKeeperManager: createCandidateStamp:
Made the directory _bsp/_defaultZkManagerDir/job_201407021636_0006
>>>>> 2014-07-02 17:01:40,320 INFO org.apache.giraph.zk.ZooKeeperManager: createCandidateStamp:
Creating my filestamp _bsp/_defaultZkManagerDir/job_201407021636_0006/_task/ec2-54-186-5-159.us-west-2.compute.amazonaws.com
1
>>>>> 2014-07-02 17:01:42,109 INFO org.apache.giraph.zk.ZooKeeperManager: getZooKeeperServerList:
For task 1, got file 'zkServerList_ec2-54-186-5-159.us-west-2.compute.amazonaws.com 0 ' (polling
period is 3000)
>>>>> 2014-07-02 17:01:42,313 INFO org.apache.giraph.zk.ZooKeeperManager: getZooKeeperServerList:
Found [ec2-54-186-5-159.us-west-2.compute.amazonaws.com, 0] 2 hosts in filename 'zkServerList_ec2-54-186-5-159.us-west-2.compute.amazonaws.com
0 '
>>>>> 2014-07-02 17:01:42,316 INFO org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers:
Got [ec2-54-186-5-159.us-west-2.compute.amazonaws.com] 1 hosts from 1 ready servers when 1
required (polling period is 3000) on attempt 0
>>>>> 2014-07-02 17:01:42,316 INFO org.apache.giraph.graph.GraphTaskManager:
setup: Starting up BspServiceWorker...
>>>>> 2014-07-02 17:01:42,330 INFO org.apache.giraph.bsp.BspService: BspService:
Connecting to ZooKeeper with job job_201407021636_0006, 1 on ec2-54-186-5-159.us-west-2.compute.amazonaws.com:22181
>>>>> 2014-07-02 17:01:42,337 INFO org.apache.zookeeper.ZooKeeper: Client environment:zookeeper.version=3.3.3-1073969,
built on 02/23/2011 22:27 GMT
>>>>> 2014-07-02 17:01:42,337 INFO org.apache.zookeeper.ZooKeeper: Client environment:host.name=ec2-54-186-5-159.us-west-2.compute.amazonaws.com
>>>>> 2014-07-02 17:01:42,337 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.version=1.7.0_60
>>>>> 2014-07-02 17:01:42,337 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.vendor=Oracle
Corporation
>>>>> 2014-07-02 17:01:42,337 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.home=/usr/lib/jvm/java-7-oracle/jre
>>>>> 2014-07-02 17:01:42,338 INFO org.apache.zookeeper.ZooKeeper: Client
>>>>>  environment:java.class.path=/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/jars/classes:/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/jars:/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/attempt_201407021636_0006_m_000001_0/work:/home/ubuntu/hadoop/bin/../conf:/usr/lib/jvm/java-7-oracle/lib/tools.jar:/home/ubuntu/hadoop/bin/..:/home/ubuntu/hadoop/bin/../hadoop-core-0.20.203.0.jar:/home/ubuntu/hadoop/bin/../lib/aspectjrt-1.6.5.jar:/home/ubuntu/hadoop/bin/../lib/aspectjtools-1.6.5.jar:/home/ubuntu/hadoop/bin/../lib/commons-beanutils-1.7.0.jar:/home/ubuntu/hadoop/bin/../lib/commons-beanutils-core-1.8.0.jar:/home/ubuntu/hadoop/bin/../lib/commons-cli-1.2.jar:/home/ubuntu/hadoop/bin/../lib/commons-codec-1.4.jar:/home/ubuntu/hadoop/bin/../lib/commons-collections-3.2.1.jar:/home/ubuntu/hadoop/bin/../lib/commons-configuration-1.6.jar:/home/ubuntu/hadoo
>>>>> p/bin/../lib/commons-daemon-1.0.1.jar:/home/ubuntu/hadoop/bin/../lib/commons-digester-1.8.jar:/home/ubuntu/hadoop/bin/../lib/commons-el-1.0.jar:/home/ubuntu/hadoop/bin/../lib/commons-httpclient-3.0.1.jar:/home/ubuntu/hadoop/bin/../lib/commons-lang-2.4.jar:/home/ubuntu/hadoop/bin/../lib/commons-logging-1.1.1.jar:/home/ubuntu/hadoop/bin/../lib/commons-logging-api-1.0.4.jar:/home/ubuntu/hadoop/bin/../lib/commons-math-2.1.jar:/home/ubuntu/hadoop/bin/../lib/commons-net-1.4.1.jar:/home/ubuntu/hadoop/bin/../lib/core-3.1.1.jar:/home/ubuntu/hadoop/bin/../lib/hsqldb-1.8.0.10.jar:/home/ubuntu/hadoop/bin/../lib/jackson-core-asl-1.0.1.jar:/home/ubuntu/hadoop/bin/../lib/jackson-mapper-asl-1.0.1.jar:/home/ubuntu/hadoop/bin/../lib/jasper-compiler-5.5.12.jar:/home/ubuntu/hadoop/bin/../lib/jasper-runtime-5.5.12.jar:/home/ubuntu/hadoop/bin/../lib/jets3t-0.6.1.jar:/home/ubuntu/hadoop/bin/../lib/jetty-6.1.26.jar:/home/ubuntu/hadoop/bin/../lib/jetty-util-6.1.26.jar:/home/ubu
>>>>> ntu/hadoop/bin/../lib/jsch-0.1.42.jar:/home/ubuntu/hadoop/bin/../lib/junit-4.5.jar:/home/ubuntu/hadoop/bin/../lib/kfs-0.2.2.jar:/home/ubuntu/hadoop/bin/../lib/log4j-1.2.15.jar:/home/ubuntu/hadoop/bin/../lib/mockito-all-1.8.5.jar:/home/ubuntu/hadoop/bin/../lib/oro-2.0.8.jar:/home/ubuntu/hadoop/bin/../lib/servlet-api-2.5-20081211.jar:/home/ubuntu/hadoop/bin/../lib/slf4j-api-1.4.3.jar:/home/ubuntu/hadoop/bin/../lib/slf4j-log4j12-1.4.3.jar:/home/ubuntu/hadoop/bin/../lib/xmlenc-0.52.jar:/home/ubuntu/hadoop/bin/../lib/jsp-2.1/jsp-2.1.jar:/home/ubuntu/hadoop/bin/../lib/jsp-2.1/jsp-api-2.1.jar
>>>>> 2014-07-02 17:01:42,338 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.library.path=/home/ubuntu/hadoop/bin/../lib/native/Linux-amd64-64:/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/attempt_201407021636_0006_m_000001_0/work
>>>>> 2014-07-02 17:01:42,338 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/attempt_201407021636_0006_m_000001_0/work/tmp
>>>>> 2014-07-02 17:01:42,338 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.compiler=<NA>
>>>>> 2014-07-02 17:01:42,338 INFO org.apache.zookeeper.ZooKeeper: Client environment:os.name=Linux
>>>>> 2014-07-02 17:01:42,338 INFO org.apache.zookeeper.ZooKeeper: Client environment:os.arch=amd64
>>>>> 2014-07-02 17:01:42,338 INFO org.apache.zookeeper.ZooKeeper: Client environment:os.version=3.2.0-58-virtual
>>>>> 2014-07-02 17:01:42,338 INFO org.apache.zookeeper.ZooKeeper: Client environment:user.name=ubuntu
>>>>> 2014-07-02 17:01:42,338 INFO org.apache.zookeeper.ZooKeeper: Client environment:user.home=/home/ubuntu
>>>>> 2014-07-02 17:01:42,338 INFO org.apache.zookeeper.ZooKeeper: Client environment:user.dir=/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/attempt_201407021636_0006_m_000001_0/work
>>>>> 2014-07-02 17:01:42,339 INFO org.apache.zookeeper.ZooKeeper: Initiating
client connection, connectString=ec2-54-186-5-159.us-west-2.compute.amazonaws.com:22181 sessionTimeout=60000
watcher=org.apache.giraph.worker.BspServiceWorker@54a5733f
>>>>> 2014-07-02 17:01:42,347 INFO org.apache.zookeeper.ClientCnxn: Opening
socket connection to server ec2-54-186-5-159.us-west-2.compute.amazonaws.com/172.31.45.24:22181
>>>>> 2014-07-02 <http://ec2-54-186-5-159.us-west-2.compute.amazonaws.com/172.31.45.24:221812014-07-02>
17:01:42,347 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to ec2-54-186-5-159.us-west-2.compute.amazonaws.com/172.31.45.24:22181,
initiating session
>>>>> 2014-07-02 17:01:42,812 INFO org.apache.zookeeper.ClientCnxn: Session
establishment complete on server ec2-54-186-5-159.us-west-2.compute.amazonaws.com/172.31.45.24:22181,
sessionid = 0x146f806378d0001, negotiated timeout = 600000
>>>>> 2014-07-02 17:01:42,814 INFO org.apache.giraph.bsp.BspService: process:
Asynchronous connection complete.
>>>>> 2014-07-02 17:01:42,819 INFO org.apache.giraph.comm.netty.NettyWorkerServer:
createMessageStoreFactory: Using ByteArrayMessagesPerVertexStore since there is no combiner
>>>>> 2014-07-02 17:01:42,881 INFO org.apache.giraph.comm.netty.NettyServer:
NettyServer: Using execution handler with 8 threads after requestFrameDecoder.
>>>>> 2014-07-02 17:01:42,904 INFO org.apache.giraph.comm.netty.NettyServer:
start: Started server communication server: ec2-54-186-5-159.us-west-2.compute.amazonaws.com/172.31.45.24:30001
with up to 16 threads on bind attempt 0 with sendBufferSize = 32768 receiveBufferSize = 524288
backlog = 1
>>>>> 2014-07-02 17:01:42,907 INFO org.apache.giraph.comm.netty.NettyClient:
NettyClient: Using execution handler with 8 threads after requestEncoder.
>>>>> 2014-07-02 17:01:42,914 INFO org.apache.giraph.graph.GraphTaskManager:
setup: Registering health of this worker...
>>>>> 2014-07-02 17:01:45,049 INFO org.apache.giraph.bsp.BspService: process:
applicationAttemptChanged signaled
>>>>> 2014-07-02 17:01:45,068 WARN org.apache.giraph.bsp.BspService: process:
Unknown and unprocessed event (path=/_hadoopBsp/job_201407021636_0006/_applicationAttemptsDir/0/_superstepDir,
type=NodeChildrenChanged, state=SyncConnected)
>>>>> 2014-07-02 17:01:45,076 INFO org.apache.giraph.worker.BspServiceWorker:
registerHealth: Created my health node for attempt=0, superstep=-1 with /_hadoopBsp/job_201407021636_0006/_applicationAttemptsDir/0/_superstepDir/-1/_workerHealthyDir/ec2-54-186-5-159.us-west-2.compute.amazonaws.com_1
and workerInfo= Worker(hostname=ec2-54-186-5-159.us-west-2.compute.amazonaws.com, MRtaskID=1,
port=30001)
>>>>> 2014-07-02 17:01:45,183 INFO org.apache.giraph.comm.netty.NettyServer:
start: Using Netty without authentication.
>>>>> 2014-07-02 17:01:45,339 INFO org.apache.giraph.bsp.BspService: process:
partitionAssignmentsReadyChanged (partitions are assigned)
>>>>> 2014-07-02 17:01:45,345 INFO org.apache.giraph.worker.BspServiceWorker:
startSuperstep: Master(hostname=ec2-54-186-5-159.us-west-2.compute.amazonaws.com, MRtaskID=0,
port=30000)
>>>>> 2014-07-02 17:01:45,345 INFO org.apache.giraph.worker.BspServiceWorker:
startSuperstep: Ready for computation on superstep -1 since worker selection and vertex range
assignments are done in /_hadoopBsp/job_201407021636_0006/_applicationAttemptsDir/0/_superstepDir/-1/_addressesAndPartitions
>>>>> 2014-07-02 17:01:45,346 INFO org.apache.giraph.comm.netty.NettyClient:
Using Netty without authentication.
>>>>> 2014-07-02 17:01:45,354 INFO org.apache.giraph.comm.netty.NettyClient:
connectAllAddresses: Successfully added 1 connections, (1 total connected) 0 failed, 0 failures
total.
>>>>> 2014-07-02 17:01:45,359 INFO org.apache.giraph.worker.BspServiceWorker:
loadInputSplits: Using 1 thread(s), originally 1 threads(s) for 2 total splits.
>>>>> 2014-07-02 17:01:45,362 INFO org.apache.giraph.comm.SendPartitionCache:
SendPartitionCache: maxVerticesPerTransfer = 10000
>>>>> 2014-07-02 17:01:45,362 INFO org.apache.giraph.comm.SendPartitionCache:
SendPartitionCache: maxEdgesPerTransfer = 80000
>>>>> 2014-07-02 17:01:45,372 INFO org.apache.giraph.worker.InputSplitsHandler:
reserveInputSplit: Reserved input split path /_hadoopBsp/job_201407021636_0006/_vertexInputSplitDir/0,
overall roughly 0.0% input splits reserved
>>>>> 2014-07-02 17:01:45,373 INFO org.apache.giraph.worker.InputSplitsCallable:
getInputSplit: Reserved /_hadoopBsp/job_201407021636_0006/_vertexInputSplitDir/0 from ZooKeeper
and got input split 'hdfs://ec2-54-186-5-159.us-west-2.compute.amazonaws.com:54310/user/ubuntu/input/CA.txt:0+67108864'
>>>>> 2014-07-02 17:01:46,615 INFO org.apache.giraph.worker.VertexInputSplitsCallable:
readVertexInputSplit: Loaded 250000 vertices at 200412.01085983627 vertices/sec 697447 edges
at 560085.1663971642 edges/sec Memory (free/total/max) = 112.94M / 182.50M / 182.50M
>>>>> 2014-07-02 17:01:47,440 INFO org.apache.giraph.worker.VertexInputSplitsCallable:
readVertexInputSplit: Loaded 500000 vertices at 241131.36490688394 vertices/sec 1419367 edges
at 685221.3493060106 edges/sec Memory (free/total/max) = 45.07M / 187.50M / 187.50M
>>>>> 2014-07-02 17:01:51,322 INFO org.apache.giraph.worker.VertexInputSplitsCallable:
readVertexInputSplit: Loaded 750000 vertices at 125921.72750649283 vertices/sec 2149814 edges
at 361077.323111284 edges/sec Memory (free/total/max) = 16.73M / 189.50M / 189.50M
>>>>> 2014-07-02 17:01:55,205 ERROR org.apache.giraph.worker.BspServiceWorker:
unregisterHealth: Got failure, unregistering health on /_hadoopBsp/job_201407021636_0006/_applicationAttemptsDir/0/_superstepDir/-1/_workerHealthyDir/ec2-54-186-5-159.us-west-2.compute.amazonaws.com_1
on superstep -1
>>>>> 2014-07-02 17:01:55,213 INFO org.apache.hadoop.mapred.TaskLogsTruncater:
Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
>>>>> 2014-07-02 17:01:55,716 INFO org.apache.hadoop.io.nativeio.NativeIO:
Initialized cache for UID to User mapping with a cache timeout of 14400 seconds.
>>>>> 2014-07-02 17:01:55,717 INFO org.apache.hadoop.io.nativeio.NativeIO:
Got UserName ubuntu for UID 1000 from the native implementation
>>>>> 2014-07-02 17:01:55,718 WARN org.apache.hadoop.mapred.Child: Error running
child
>>>>> java.lang.IllegalStateException: run: Caught an unrecoverable exception
waitFor: ExecutionException occurred while waiting for org.apache.giraph.utils.ProgressableUtils$FutureWaitable@36df8f5
>>>>> 	at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:102)
>>>>> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
>>>>> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
>>>>> 	at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
>>>>> 	at java.security.AccessController.doPrivileged(Native Method)
>>>>> 	at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
>>>>> 	at org.apache.hadoop.mapred.Child.main(Child.java:253)
>>>>> Caused by: java.lang.IllegalStateException: waitFor: ExecutionException
occurred while waiting for org.apache.giraph.utils.ProgressableUtils$FutureWaitable@36df8f5
>>>>> 	at org.apache.giraph.utils.ProgressableUtils.waitFor(ProgressableUtils.java:151)
>>>>> 	at org.apache.giraph.utils.ProgressableUtils.waitForever(ProgressableUtils.java:111)
>>>>> 	at org.apache.giraph.utils.ProgressableUtils.getFutureResult(ProgressableUtils.java:73)
>>>>> 	at org.apache.giraph.utils.ProgressableUtils.getResultsWithNCallables(ProgressableUtils.java:192)
>>>>> 	at org.apache.giraph.worker.BspServiceWorker.loadInputSplits(BspServiceWorker.java:276)
>>>>> 	at org.apache.giraph.worker.BspServiceWorker.loadVertices(BspServiceWorker.java:323)
>>>>> 	at org.apache.giraph.worker.BspServiceWorker.setup(BspServiceWorker.java:506)
>>>>> 	at org.apache.giraph.graph.GraphTaskManager.execute(GraphTaskManager.java:230)
>>>>> 	at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:92)
>>>>> 	... 7 more
>>>>> Caused by: java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError:
Java heap space
>>>>> 	at java.util.concurrent.FutureTask.report(FutureTask.java:122)
>>>>> 	at java.util.concurrent.FutureTask.get(FutureTask.java:202)
>>>>> 	at org.apache.giraph.utils.ProgressableUtils$FutureWaitable.waitFor(ProgressableUtils.java:271)
>>>>> 	at org.apache.giraph.utils.ProgressableUtils.waitFor(ProgressableUtils.java:143)
>>>>> 	... 15 more
>>>>> Caused by: java.lang.OutOfMemoryError: Java heap space
>>>>> 	at java.util.concurrent.ConcurrentHashMap$Segment.rehash(ConcurrentHashMap.java:501)
>>>>> 	at java.util.concurrent.ConcurrentHashMap$Segment.put(ConcurrentHashMap.java:460)
>>>>> 	at java.util.concurrent.ConcurrentHashMap.put(ConcurrentHashMap.java:1130)
>>>>> 	at org.apache.giraph.partition.SimplePartition.addPartition(SimplePartition.java:87)
>>>>> 	at org.apache.giraph.partition.SimplePartitionStore.addPartition(SimplePartitionStore.java:71)
>>>>> 	at org.apache.giraph.comm.requests.SendVertexRequest.doRequest(SendVertexRequest.java:81)
>>>>> 	at org.apache.giraph.comm.netty.NettyWorkerClientRequestProcessor.doRequest(NettyWorkerClientRequestProcessor.java:470)
>>>>> 	at org.apache.giraph.comm.netty.NettyWorkerClientRequestProcessor.sendPartitionRequest(NettyWorkerClientRequestProcessor.java:203)
>>>>> 	at org.apache.giraph.comm.netty.NettyWorkerClientRequestProcessor.sendVertexRequest(NettyWorkerClientRequestProcessor.java:267)
>>>>> 	at org.apache.giraph.worker.VertexInputSplitsCallable.readInputSplit(VertexInputSplitsCallable.java:140)
>>>>> 	at org.apache.giraph.worker.InputSplitsCallable.loadInputSplit(InputSplitsCallable.java:220)
>>>>> 	at org.apache.giraph.worker.InputSplitsCallable.call(InputSplitsCallable.java:161)
>>>>> 	at org.apache.giraph.worker.InputSplitsCallable.call(InputSplitsCallable.java:58)
>>>>> 	at org.apache.giraph.utils.LogStacktraceCallable.call(LogStacktraceCallable.java:51)
>>>>> 	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>>>> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>>> 	at java.lang.Thread.run(Thread.java:745)
>>>>> 2014-07-02 17:01:55,722 INFO org.apache.hadoop.mapred.Task: Runnning
cleanup for the task
>>>>>
>>>>>
>>>>> ===========================================================================
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> I've tried increasing Java Heap Space in hadoop/conf/mapred-site.xml
>>>>> by adding this:
>>>>>   <property>
>>>>>     <name>mapred.child.java.opts</name>
>>>>>     <value>-Xmx1024m</value>
>>>>>   </property>
>>>>>
>>>>> But that just caused the entire job to fail from the start.
>>>>>
>>>>> Before using this version of Giraph, I used 1.0.0 and 1.1.0-RC0 and
>>>>> those versions provide me with more and different errors to debug that
>>>>> relate to problems with Hadoop itself.  So the Giraph version I'm currently
>>>>> using seems to be the best for me because these errors seem more manageable.
>>>>>
>>>>> What can I do to fix this error?  I thought Giraph was built for large
>>>>> scale graph processing so I suppose this problem was encountered before
by
>>>>> someone testing large graphs.  I searched through the mailing archives
and
>>>>> couldn't find anything though.
>>>>>
>>>>> I can provide more information if you need it.  Thanks a lot.
>>>>>
>>>>> Bryan Rowe
>>>>>
>>>>>
>>>>>
>>>>> *Then Young Han replied:*
>>>>>
>>>>> 10 minutes seems way too long to load in 91mb from HDFS. Are you sure
>>>>> your graph's format is correct? For the Json input formats, each line
of
>>>>> your file should be:
>>>>>
>>>>> [vertex id, vertex value, [[dst id, edge weight], [dst id, edge
>>>>> weigth], ..., [dst id, edge weight]]]
>>>>>
>>>>> You can set the vertex value for every vertex to 0 (SSSP will
>>>>> overwrite that value in the first superstep). If your graph doesn't have
>>>>> edge weights, you can just set them to 1.
>>>>>
>>>>> Also, have you tried a larger Xmx value? E.g., 4096m or 8192m.
>>>>>
>>>>> Young
>>>>>
>>>>>
>>>>> *Then I replied:*
>>>>> Hi Young,
>>>>>
>>>>> I believe my graph has the correct format.
>>>>> Here is the first 40 lines of the graph I'm using:
>>>>> [0,0,[[1,1],[4,1],[5,1]]]
>>>>> [1,0,[[0,1],[2,1],[3,1]]]
>>>>> [2,0,[[1,1],[6,1],[7,1]]]
>>>>> [3,0,[[8,1],[1,1],[9,1]]]
>>>>> [4,0,[[0,1],[10,1],[11,1]]]
>>>>> [5,0,[[0,1]]]
>>>>> [6,0,[[2,1],[12,1]]]
>>>>> [7,0,[[2,1],[12,1],[13,1]]]
>>>>> [8,0,[[11,1],[3,1],[60,1]]]
>>>>> [9,0,[[3,1]]]
>>>>> [10,0,[[35,1],[4,1],[38,1]]]
>>>>> [11,0,[[8,1],[59,1],[4,1]]]
>>>>> [12,0,[[41,1],[6,1],[7,1]]]
>>>>> [13,0,[[89,1],[90,1],[91,1],[7,1]]]
>>>>> [14,0,[[18,1],[19,1],[15,1]]]
>>>>> [15,0,[[16,1],[17,1],[14,1]]]
>>>>> [16,0,[[20,1],[21,1],[22,1],[15,1]]]
>>>>> [17,0,[[24,1],[23,1],[21,1],[15,1]]]
>>>>> [18,0,[[24,1],[14,1]]]
>>>>> [19,0,[[25,1],[22,1],[14,1]]]
>>>>> [20,0,[[16,1],[25,1],[26,1]]]
>>>>> [21,0,[[16,1],[17,1],[30,1]]]
>>>>> [22,0,[[16,1],[19,1]]]
>>>>> [23,0,[[17,1],[105,1]]]
>>>>> [24,0,[[17,1],[18,1],[58,1]]]
>>>>> [25,0,[[27,1],[19,1],[20,1]]]
>>>>> [26,0,[[28,1],[27,1],[20,1]]]
>>>>> [27,0,[[25,1],[26,1],[29,1]]]
>>>>> [28,0,[[26,1],[29,1],[30,1]]]
>>>>> [29,0,[[27,1],[28,1],[31,1]]]
>>>>> [30,0,[[32,1],[33,1],[28,1],[21,1]]]
>>>>> [31,0,[[34,1],[29,1]]]
>>>>> [32,0,[[105,1],[30,1],[39,1]]]
>>>>> [33,0,[[30,1]]]
>>>>> [34,0,[[38,1],[31,1]]]
>>>>> [35,0,[[10,1],[36,1],[37,1]]]
>>>>> [36,0,[[40,1],[35,1],[39,1]]]
>>>>> [37,0,[[41,1],[35,1]]]
>>>>> [38,0,[[10,1],[34,1]]]
>>>>> [39,0,[[32,1],[58,1],[36,1],[119,1]]]
>>>>> [40,0,[[90,1],[36,1]]]
>>>>>
>>>>> Also, sorry about sending the email twice.  My email client messed up.
>>>>>
>>>>> Thanks,
>>>>> Bryan
>>>>>
>>>>> I've tried a larger graph.  It ran for 2 hours then failed with
>>>>> different error messages I believe.
>>>>>
>>>>> Bryan
>>>>>
>>>>
>>>>
>>>
>>
>

Mime
View raw message