giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Young Han <young....@uwaterloo.ca>
Subject Re: Errors with large graphs (continued, new email)
Date Thu, 03 Jul 2014 22:59:56 GMT
Great.

I think it takes so long because you have a graph with a very large
diameter (hence the very high # of supersteps)---it's one of the cases
where BSP doesn't do so well. If you try graphs with smaller diameters
(e.g., social network/webgraphs from SNAP), you'll find that they finish
much faster.

It's reasonable. Input loading didn't take more than 20 seconds, and each
superstep is at most ~1.5s. I don't believe adding more machines will help
in this case, as it took a long time due to the sheer number of supersteps
rather than the cost of a single superstep. (I.e., adding more machines
will not reduce the # of supersteps---you'll probably find most of your
machines idling.)

Young


On Thu, Jul 3, 2014 at 6:46 PM, Bryan Rowe <bryanrowe09@gmail.com> wrote:

> Awesome, the job completed after adding those settings:
> http://ec2-54-186-5-159.us-west-2.compute.amazonaws.com:50030/jobdetails.jsp?jobid=job_201407032211_0002&refresh=0
>
> Thanks a lot for the help.  It did take 13mins, 47sec to complete though,
> which seems kind of long.  However it is running on just one node.  It will
> probably be a lot faster on a multi-node setup, which I plan on doing.
>
> One more thing, do you think that the time it took to complete is
> reasonable for a one-node setup?
>
> Thanks,
> Bryan
>
>
> On Thu, Jul 3, 2014 at 2:59 PM, Young Han <young.han@uwaterloo.ca> wrote:
>
>> Ahh yes, that's another common error. Try adding this to your mapred-site:
>>
>> <property>
>>   <name>mapreduce.job.counters.max</name>
>>   <value>1000000</value>
>> </property>
>> <property>
>>   <name>mapreduce.job.counters.limit</name>
>>   <value>1000000</value>
>> </property>
>>
>> Young
>>
>>
>> On Thu, Jul 3, 2014 at 5:49 PM, Bryan Rowe <bryanrowe09@gmail.com> wrote:
>>
>>> I just changed my mapred-site.xml file to increase the heap size:
>>> <configuration>
>>> <property>
>>> <name>mapred.job.tracker</name>
>>> <value>hdfs://ec2-54-186-5-159.us-west-2.compute.amazonaws.com:54311
>>> </value>
>>> </property>
>>>
>>> <property>
>>> <name>mapred.tasktracker.map.tasks.maximum</name>
>>> <value>4</value>
>>> </property>
>>>
>>> <property>
>>> <name>mapred.map.tasks</name>
>>> <value>4</value>
>>> </property>
>>>
>>>
>>>
>>>
>>>
>>> *<property><name>mapred.child.java.opts</name><value>-Xmx1024m</value></property>*
>>> </configuration>
>>>
>>>
>>> I ran a Hadoop job and it seemed to be working for the first 2 minutes
>>> because it was computing supersteps.  However after 2 minutes it failed.
>>> Part of the reason is because the Haddop counters exceeded 120.  There were
>>> other errors though and I'm not sure if those are related to the counters
>>> problem.
>>>
>>> Here is an image album of the errors:  http://imgur.com/a/mAqn3
>>> Also, here is a link to the Hadoop job that failed:
>>> http://ec2-54-186-5-159.us-west-2.compute.amazonaws.com:50030/jobdetails.jsp?jobid=job_201407032123_0002&refresh=30
>>>
>>> I'll keep my instance up and running so the link should work.
>>> What would you recommend next?
>>>
>>> Thanks,
>>> Bryan
>>>
>>>
>>> On Thu, Jul 3, 2014 at 2:19 PM, Young Han <young.han@uwaterloo.ca>
>>> wrote:
>>>
>>>> What sort of error does it give? Did you try restarting Hadoop after
>>>> setting it? (stop-all.sh, start-all.sh, hadoop dfs -safemode wait).
>>>>
>>>> Young
>>>>
>>>>
>>>> On Thu, Jul 3, 2014 at 5:08 PM, Bryan Rowe <bryanrowe09@gmail.com>
>>>> wrote:
>>>>
>>>>> Yes, before conversion they were also sorted by vertex ID.
>>>>> 0       1       1
>>>>> 0       4       1
>>>>> 0       5       1
>>>>> 1       0       1
>>>>> 1       2       1
>>>>> 1       3       1
>>>>> 2       1       1
>>>>> 2       6       1
>>>>> 2       7       1
>>>>> 3       8       1
>>>>>
>>>>> That was the format before conversion with format [source destination
>>>>> weight] of edges.
>>>>> I'm using a medium-tier EC2 instance c3.2xlarge.  It has 8 vCPUs and
>>>>> 15 Gi of memory according to Amazon's website.
>>>>>
>>>>> Any suggestions on tweaking Xmx?  I tried doing that without any luck,
>>>>> it just didn't run the map phase then failed.
>>>>>
>>>>>
>>>>> On Thu, Jul 3, 2014 at 1:57 PM, Young Han <young.han@uwaterloo.ca>
>>>>> wrote:
>>>>>
>>>>>> From the other thread...
>>>>>>
>>>>>> Yeah, your input format looks correct. Did you have the graph sorted
>>>>>> by source vertex IDs before conversion? (I'm not sure if duplicate entries
>>>>>> with the same source ID matters, but just in case.)
>>>>>>
>>>>>> They're all out of memory errors, so I think Xmx is the culprit. What
>>>>>> type of EC2 instances are you using? You probably want to use something
>>>>>> larger than t1.micro or m1.small.
>>>>>>
>>>>>> Young
>>>>>>
>>>>>>
>>>>>> On Thu, Jul 3, 2014 at 4:53 PM, Bryan Rowe <bryanrowe09@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> First of all, I started this email thread with my old email from
>>>>>>> Yahoo, which was a mistake because it kept sending out duplicates.  Sorry
>>>>>>> for the inconvenience, but I'll continue it using this email thread from
>>>>>>> now on.
>>>>>>>
>>>>>>> *I originally posted this:*
>>>>>>> Hello,
>>>>>>>
>>>>>>> Giraph: release-1.0.0-RC3
>>>>>>>
>>>>>>> In short, when I use large graphs with the Shortest Paths example,
>>>>>>> it fails.  But when I use the small graph provided on the Quick Start
>>>>>>> guide, it succeeds.
>>>>>>> I converted all of my large graphs into the format shown in the
>>>>>>> Quick Start guide to simply things.
>>>>>>> I'm using a one-node setup.
>>>>>>>
>>>>>>>
>>>>>>> Here is the command I'm using:
>>>>>>> hadoop jar
>>>>>>> $GIRAPH_HOME/giraph-examples/target/giraph-examples-1.0.0-for-hadoop-0.20.203.0-jar-with-dependencies.jar
>>>>>>>
>>>>>>> org.apache.giraph.GiraphRunner
>>>>>>> org.apache.giraph.examples.SimpleShortestPathsVertex
>>>>>>> -vif
>>>>>>> org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat
>>>>>>>  -vip /user/ubuntu/input/CA.txt -of
>>>>>>> org.apache.giraph.io.formats.IdWithValueTextOutputFormat
>>>>>>>  -op /user/ubuntu/output/shortestpaths
>>>>>>> -w 1
>>>>>>>
>>>>>>> (all on one line)
>>>>>>>
>>>>>>> CA.txt is a large graph file: 96,026,228 bytes
>>>>>>>
>>>>>>> The job fails in 10mins, 46sec.
>>>>>>>
>>>>>>> Two Map tasks are created when run.
>>>>>>> The first one, task_201407021636_0006_m_000000, is KILLED.  syslog:
>>>>>>>
>>>>>>> ===========================================================================
>>>>>>>
>>>>>>> 2014-07-02 17:01:34,757 INFO org.apache.hadoop.util.NativeCodeLoader: Loaded the native-hadoop library
>>>>>>> 2014-07-02 17:01:34,945 WARN org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Source name ugi already exists!
>>>>>>> 2014-07-02 17:01:35,127 INFO org.apache.giraph.graph.GraphTaskManager: setup: Log level remains at info
>>>>>>> 2014-07-02 17:01:35,159 INFO org.apache.giraph.graph.GraphTaskManager: Distributed cache is empty. Assuming fatjar.
>>>>>>> 2014-07-02 17:01:35,159 INFO org.apache.giraph.graph.GraphTaskManager: setup: classpath @ /home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/jars/job.jar for job Giraph: org.apache.giraph.examples.SimpleShortestPathsVertex
>>>>>>> 2014-07-02 17:01:35,201 INFO org.apache.giraph.zk.ZooKeeperManager: createCandidateStamp: Made the directory _bsp/_defaultZkManagerDir/job_201407021636_0006
>>>>>>> 2014-07-02 17:01:35,204 INFO org.apache.giraph.zk.ZooKeeperManager: createCandidateStamp: Creating my filestamp _bsp/_defaultZkManagerDir/job_201407021636_0006/_task/ec2-54-186-5-159.us-west-2.compute.amazonaws.com 0
>>>>>>> 2014-07-02 17:01:35,219 INFO org.apache.giraph.zk.ZooKeeperManager: getZooKeeperServerList: Got [ec2-54-186-5-159.us-west-2.compute.amazonaws.com] 1 hosts from 1 candidates when 1 required (polling period is 3000) on attempt 0
>>>>>>> 2014-07-02 17:01:35,220 INFO org.apache.giraph.zk.ZooKeeperManager: createZooKeeperServerList: Creating the final ZooKeeper file '_bsp/_defaultZkManagerDir/job_201407021636_0006/zkServerList_ec2-54-186-5-159.us-west-2.compute.amazonaws.com 0 '
>>>>>>> 2014-07-02 17:01:35,228 INFO org.apache.giraph.zk.ZooKeeperManager: getZooKeeperServerList: For task 0, got file 'zkServerList_ec2-54-186-5-159.us-west-2.compute.amazonaws.com 0 ' (polling period is 3000)
>>>>>>> 2014-07-02 17:01:35,228 INFO org.apache.giraph.zk.ZooKeeperManager: getZooKeeperServerList: Found [ec2-54-186-5-159.us-west-2.compute.amazonaws.com, 0] 2 hosts in filename 'zkServerList_ec2-54-186-5-159.us-west-2.compute.amazonaws.com 0 '
>>>>>>> 2014-07-02 17:01:35,229 INFO org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers: Trying to delete old directory /home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/work/_bspZooKeeper
>>>>>>> 2014-07-02 17:01:35,234 INFO org.apache.giraph.zk.ZooKeeperManager: generateZooKeeperConfigFile: Creating file /home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/work/_bspZooKeeper/zoo.cfg in /home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/work/_bspZooKeeper with base port 22181
>>>>>>> 2014-07-02 17:01:35,234 INFO org.apache.giraph.zk.ZooKeeperManager: generateZooKeeperConfigFile: Make directory of _bspZooKeeper = true
>>>>>>> 2014-07-02 17:01:35,235 INFO org.apache.giraph.zk.ZooKeeperManager: generateZooKeeperConfigFile: Delete of zoo.cfg = false
>>>>>>> 2014-07-02 17:01:35,236 INFO org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers: Attempting to start ZooKeeper server with command [/usr/lib/jvm/java-7-oracle/jre/bin/java, -Xmx512m, -XX:ParallelGCThreads=4, -XX:+UseConcMarkSweepGC, -XX:CMSInitiatingOccupancyFraction=70, -XX:MaxGCPauseMillis=100, -cp, /home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/jars/job.jar, org.apache.zookeeper.server.quorum.QuorumPeerMain, /home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/work/_bspZooKeeper/zoo.cfg] in directory /home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/work/_bspZooKeeper
>>>>>>> 2014-07-02 17:01:35,238 INFO org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers: Shutdown hook added.
>>>>>>> 2014-07-02 17:01:35,238 INFO org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers: Connect attempt 0 of 10 max trying to connect to ec2-54-186-5-159.us-west-2.compute.amazonaws.com:22181 with poll msecs = 3000
>>>>>>> 2014-07-02 17:01:35,241 WARN org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers: Got ConnectException
>>>>>>> java.net.ConnectException: Connection refused
>>>>>>> 	at java.net.PlainSocketImpl.socketConnect(Native Method)
>>>>>>> 	at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
>>>>>>> 	at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
>>>>>>> 	at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
>>>>>>> 	at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
>>>>>>> 	at java.net.Socket.connect(Socket.java:579)
>>>>>>> 	at org.apache.giraph.zk.ZooKeeperManager.onlineZooKeeperServers(ZooKeeperManager.java:701)
>>>>>>> 	at org.apache.giraph.graph.GraphTaskManager.startZooKeeperManager(GraphTaskManager.java:357)
>>>>>>> 	at org.apache.giraph.graph.GraphTaskManager.setup(GraphTaskManager.java:188)
>>>>>>> 	at org.apache.giraph.graph.GraphMapper.setup(GraphMapper.java:60)
>>>>>>> 	at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:90)
>>>>>>> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
>>>>>>> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
>>>>>>> 	at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
>>>>>>> 	at java.security.AccessController.doPrivileged(Native Method)
>>>>>>> 	at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>>>> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
>>>>>>> 	at org.apache.hadoop.mapred.Child.main(Child.java:253)
>>>>>>> 2014-07-02 17:01:38,242 INFO org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers: Connect attempt 1 of 10 max trying to connect to ec2-54-186-5-159.us-west-2.compute.amazonaws.com:22181 with poll msecs = 3000
>>>>>>> 2014-07-02 17:01:38,243 INFO org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers: Connected to ec2-54-186-5-159.us-west-2.compute.amazonaws.com/172.31.45.24:22181!
>>>>>>> 2014-07-02 17:01:38,243 INFO org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers: Creating my filestamp _bsp/_defaultZkManagerDir/job_201407021636_0006/_zkServer/ec2-54-186-5-159.us-west-2.compute.amazonaws.com 0
>>>>>>> 2014-07-02 17:01:40,249 INFO org.apache.giraph.graph.GraphTaskManager: setup: Chosen to run ZooKeeper...
>>>>>>> 2014-07-02 17:01:40,249 INFO org.apache.giraph.graph.GraphTaskManager: setup: Starting up BspServiceMaster (master thread)...
>>>>>>> 2014-07-02 17:01:40,260 INFO org.apache.giraph.bsp.BspService: BspService: Connecting to ZooKeeper with job job_201407021636_0006, 0 on ec2-54-186-5-159.us-west-2.compute.amazonaws.com:22181
>>>>>>> 2014-07-02 17:01:40,270 INFO org.apache.zookeeper.ZooKeeper: Client environment:zookeeper.version=3.3.3-1073969, built on 02/23/2011 22:27 GMT
>>>>>>> 2014-07-02 17:01:40,270 INFO org.apache.zookeeper.ZooKeeper: Client environment:host.name=ec2-54-186-5-159.us-west-2.compute.amazonaws.com
>>>>>>> 2014-07-02 17:01:40,270 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.version=1.7.0_60
>>>>>>> 2014-07-02 17:01:40,270 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.vendor=Oracle Corporation
>>>>>>> 2014-07-02 17:01:40,271 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.home=/usr/lib/jvm/java-7-oracle/jre
>>>>>>> 2014-07-02 17:01:40,271 INFO org.apache.zookeeper.ZooKeeper: Client
>>>>>>>  environment:java.class.path=/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/jars/classes:/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/jars:/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/attempt_201407021636_0006_m_000000_0/work:/home/ubuntu/hadoop/bin/../conf:/usr/lib/jvm/java-7-oracle/lib/tools.jar:/home/ubuntu/hadoop/bin/..:/home/ubuntu/hadoop/bin/../hadoop-core-0.20.203.0.jar:/home/ubuntu/hadoop/bin/../lib/aspectjrt-1.6.5.jar:/home/ubuntu/hadoop/bin/../lib/aspectjtools-1.6.5.jar:/home/ubuntu/hadoop/bin/../lib/commons-beanutils-1.7.0.jar:/home/ubuntu/hadoop/bin/../lib/commons-beanutils-core-1.8.0.jar:/home/ubuntu/hadoop/bin/../lib/commons-cli-1.2.jar:/home/ubuntu/hadoop/bin/../lib/commons-codec-1.4.jar:/home/ubuntu/hadoop/bin/../lib/commons-collections-3.2.1.jar:/home/ubuntu/hadoop/bin/../lib/commons-configuration-1.6.jar:/home/ubuntu/hadoo
>>>>>>> p/bin/../lib/commons-daemon-1.0.1.jar:/home/ubuntu/hadoop/bin/../lib/commons-digester-1.8.jar:/home/ubuntu/hadoop/bin/../lib/commons-el-1.0.jar:/home/ubuntu/hadoop/bin/../lib/commons-httpclient-3.0.1.jar:/home/ubuntu/hadoop/bin/../lib/commons-lang-2.4.jar:/home/ubuntu/hadoop/bin/../lib/commons-logging-1.1.1.jar:/home/ubuntu/hadoop/bin/../lib/commons-logging-api-1.0.4.jar:/home/ubuntu/hadoop/bin/../lib/commons-math-2.1.jar:/home/ubuntu/hadoop/bin/../lib/commons-net-1.4.1.jar:/home/ubuntu/hadoop/bin/../lib/core-3.1.1.jar:/home/ubuntu/hadoop/bin/../lib/hsqldb-1.8.0.10.jar:/home/ubuntu/hadoop/bin/../lib/jackson-core-asl-1.0.1.jar:/home/ubuntu/hadoop/bin/../lib/jackson-mapper-asl-1.0.1.jar:/home/ubuntu/hadoop/bin/../lib/jasper-compiler-5.5.12.jar:/home/ubuntu/hadoop/bin/../lib/jasper-runtime-5.5.12.jar:/home/ubuntu/hadoop/bin/../lib/jets3t-0.6.1.jar:/home/ubuntu/hadoop/bin/../lib/jetty-6.1.26.jar:/home/ubuntu/hadoop/bin/../lib/jetty-util-6.1.26.jar:/home/ubu
>>>>>>> ntu/hadoop/bin/../lib/jsch-0.1.42.jar:/home/ubuntu/hadoop/bin/../lib/junit-4.5.jar:/home/ubuntu/hadoop/bin/../lib/kfs-0.2.2.jar:/home/ubuntu/hadoop/bin/../lib/log4j-1.2.15.jar:/home/ubuntu/hadoop/bin/../lib/mockito-all-1.8.5.jar:/home/ubuntu/hadoop/bin/../lib/oro-2.0.8.jar:/home/ubuntu/hadoop/bin/../lib/servlet-api-2.5-20081211.jar:/home/ubuntu/hadoop/bin/../lib/slf4j-api-1.4.3.jar:/home/ubuntu/hadoop/bin/../lib/slf4j-log4j12-1.4.3.jar:/home/ubuntu/hadoop/bin/../lib/xmlenc-0.52.jar:/home/ubuntu/hadoop/bin/../lib/jsp-2.1/jsp-2.1.jar:/home/ubuntu/hadoop/bin/../lib/jsp-2.1/jsp-api-2.1.jar
>>>>>>> 2014-07-02 17:01:40,271 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.library.path=/home/ubuntu/hadoop/bin/../lib/native/Linux-amd64-64:/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/attempt_201407021636_0006_m_000000_0/work
>>>>>>> 2014-07-02 17:01:40,271 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/attempt_201407021636_0006_m_000000_0/work/tmp
>>>>>>> 2014-07-02 17:01:40,271 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.compiler=<NA>
>>>>>>> 2014-07-02 17:01:40,271 INFO org.apache.zookeeper.ZooKeeper: Client environment:os.name=Linux
>>>>>>> 2014-07-02 17:01:40,271 INFO org.apache.zookeeper.ZooKeeper: Client environment:os.arch=amd64
>>>>>>> 2014-07-02 17:01:40,271 INFO org.apache.zookeeper.ZooKeeper: Client environment:os.version=3.2.0-58-virtual
>>>>>>> 2014-07-02 17:01:40,271 INFO org.apache.zookeeper.ZooKeeper: Client environment:user.name=ubuntu
>>>>>>> 2014-07-02 17:01:40,271 INFO org.apache.zookeeper.ZooKeeper: Client environment:user.home=/home/ubuntu
>>>>>>> 2014-07-02 17:01:40,271 INFO org.apache.zookeeper.ZooKeeper: Client environment:user.dir=/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/attempt_201407021636_0006_m_000000_0/work
>>>>>>> 2014-07-02 17:01:40,272 INFO org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=ec2-54-186-5-159.us-west-2.compute.amazonaws.com:22181 sessionTimeout=60000 watcher=org.apache.giraph.master.BspServiceMaster@4deb9df0
>>>>>>> 2014-07-02 17:01:40,284 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server ec2-54-186-5-159.us-west-2.compute.amazonaws.com/172.31.45.24:22181
>>>>>>> 2014-07-02 <http://ec2-54-186-5-159.us-west-2.compute.amazonaws.com/172.31.45.24:221812014-07-02> 17:01:40,296 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to ec2-54-186-5-159.us-west-2.compute.amazonaws.com/172.31.45.24:22181, initiating session
>>>>>>> 2014-07-02 17:01:42,336 INFO org.apache.zookeeper.ClientCnxn: Session establishment complete on server ec2-54-186-5-159.us-west-2.compute.amazonaws.com/172.31.45.24:22181, sessionid = 0x146f806378d0000, negotiated timeout = 600000
>>>>>>> 2014-07-02 17:01:42,337 INFO org.apache.giraph.bsp.BspService: process: Asynchronous connection complete.
>>>>>>> 2014-07-02 17:01:42,343 INFO org.apache.giraph.graph.GraphTaskManager: map: No need to do anything when not a worker
>>>>>>> 2014-07-02 17:01:42,343 INFO org.apache.giraph.graph.GraphTaskManager: cleanup: Starting for MASTER_ZOOKEEPER_ONLY
>>>>>>> 2014-07-02 17:01:45,033 INFO org.apache.giraph.bsp.BspService: getJobState: Job state already exists (/_hadoopBsp/job_201407021636_0006/_masterJobState)
>>>>>>> 2014-07-02 17:01:45,038 INFO org.apache.giraph.master.BspServiceMaster: becomeMaster: First child is '/_hadoopBsp/job_201407021636_0006/_masterElectionDir/ec2-54-186-5-159.us-west-2.compute.amazonaws.com_00000000000' and my bid is '/_hadoopBsp/job_201407021636_0006/_masterElectionDir/ec2-54-186-5-159.us-west-2.compute.amazonaws.com_00000000000'
>>>>>>> 2014-07-02 17:01:45,044 INFO org.apache.giraph.bsp.BspService: getApplicationAttempt: Node /_hadoopBsp/job_201407021636_0006/_applicationAttemptsDir already exists!
>>>>>>> 2014-07-02 17:01:45,049 INFO org.apache.giraph.bsp.BspService: process: applicationAttemptChanged signaled
>>>>>>> 2014-07-02 17:01:45,089 INFO org.apache.giraph.comm.netty.NettyServer: NettyServer: Using execution handler with 8 threads after requestFrameDecoder.
>>>>>>> 2014-07-02 17:01:45,111 INFO org.apache.giraph.comm.netty.NettyServer: start: Started server communication server: ec2-54-186-5-159.us-west-2.compute.amazonaws.com/172.31.45.24:30000 with up to 16 threads on bind attempt 0 with sendBufferSize = 32768 receiveBufferSize = 524288 backlog = 1
>>>>>>> 2014-07-02 17:01:45,116 INFO org.apache.giraph.comm.netty.NettyClient: NettyClient: Using execution handler with 8 threads after requestEncoder.
>>>>>>> 2014-07-02 17:01:45,119 INFO org.apache.giraph.master.BspServiceMaster: becomeMaster: I am now the master!
>>>>>>> 2014-07-02 17:01:45,126 INFO org.apache.giraph.bsp.BspService: getApplicationAttempt: Node /_hadoopBsp/job_201407021636_0006/_applicationAttemptsDir already exists!
>>>>>>> 2014-07-02 17:01:45,140 INFO org.apache.giraph.io.formats.GiraphFileInputFormat: Total input paths to process : 1
>>>>>>> 2014-07-02 17:01:45,149 INFO org.apache.giraph.master.BspServiceMaster: generateVertexInputSplits: Got 2 input splits for 1 input threads
>>>>>>> 2014-07-02 17:01:45,149 INFO org.apache.giraph.master.BspServiceMaster: createVertexInputSplits: Starting to write input split data to zookeeper with 1 threads
>>>>>>> 2014-07-02 17:01:45,163 INFO org.apache.giraph.master.BspServiceMaster: createVertexInputSplits: Done writing input split data to zookeeper
>>>>>>> 2014-07-02 17:01:45,173 INFO org.apache.giraph.comm.netty.NettyClient: Using Netty without authentication.
>>>>>>> 2014-07-02 17:01:45,187 INFO org.apache.giraph.comm.netty.NettyClient: connectAllAddresses: Successfully added 1 connections, (1 total connected) 0 failed, 0 failures total.
>>>>>>> 2014-07-02 17:01:45,188 INFO org.apache.giraph.partition.PartitionUtils: computePartitionCount: Creating 1, default would have been 1 partitions.
>>>>>>> 2014-07-02 17:01:45,350 INFO org.apache.giraph.comm.netty.NettyServer: start: Using Netty without authentication.
>>>>>>> 2014-07-02 17:06:45,347 INFO org.apache.giraph.master.BspServiceMaster: barrierOnWorkerList: 0 out of 1 workers finished on superstep -1 on path /_hadoopBsp/job_201407021636_0006/_vertexInputSplitDoneDir
>>>>>>> 2014-07-02 17:06:45,349 INFO org.apache.giraph.master.BspServiceMaster: barrierOnWorkerList: Waiting on [ec2-54-186-5-159.us-west-2.compute.amazonaws.com_1]
>>>>>>> 2014-07-02 17:11:45,355 INFO org.apache.giraph.master.BspServiceMaster: barrierOnWorkerList: 0 out of 1 workers finished on superstep -1 on path /_hadoopBsp/job_201407021636_0006/_vertexInputSplitDoneDir
>>>>>>> 2014-07-02 17:11:45,355 INFO org.apache.giraph.master.BspServiceMaster: barrierOnWorkerList: Waiting on [ec2-54-186-5-159.us-west-2.compute.amazonaws.com_1]
>>>>>>> 2014-07-02 17:12:07,201 INFO org.apache.giraph.zk.ZooKeeperManager: run: Shutdown hook started.
>>>>>>> 2014-07-02 17:12:07,202 WARN org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers: Forced a shutdown hook kill of the ZooKeeper process.
>>>>>>> 2014-07-02 17:12:07,518 INFO org.apache.zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x146f806378d0000, likely server has closed socket, closing socket connection and attempting reconnect
>>>>>>> 2014-07-02 17:12:07,518 INFO org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers: ZooKeeper process exited with 143 (note that 143 typically means killed).
>>>>>>> ===========================================================================
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> The second one, task_201407021636_0006_m_000001, goes to the FAILED
>>>>>>> state.  syslog:
>>>>>>>
>>>>>>> ===========================================================================
>>>>>>>
>>>>>>> 2014-07-02 17:01:38,016 INFO org.apache.hadoop.util.NativeCodeLoader: Loaded the native-hadoop library
>>>>>>> 2014-07-02 17:01:38,203 WARN org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Source name ugi already exists!
>>>>>>> 2014-07-02 17:01:38,379 INFO org.apache.giraph.graph.GraphTaskManager: setup: Log level remains at info
>>>>>>> 2014-07-02 17:01:40,280 INFO org.apache.giraph.graph.GraphTaskManager: Distributed cache is empty. Assuming fatjar.
>>>>>>> 2014-07-02 17:01:40,281 INFO org.apache.giraph.graph.GraphTaskManager: setup: classpath @ /home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/jars/job.jar for job Giraph: org.apache.giraph.examples.SimpleShortestPathsVertex
>>>>>>> 2014-07-02 17:01:40,317 INFO org.apache.giraph.zk.ZooKeeperManager: createCandidateStamp: Made the directory _bsp/_defaultZkManagerDir/job_201407021636_0006
>>>>>>> 2014-07-02 17:01:40,320 INFO org.apache.giraph.zk.ZooKeeperManager: createCandidateStamp: Creating my filestamp _bsp/_defaultZkManagerDir/job_201407021636_0006/_task/ec2-54-186-5-159.us-west-2.compute.amazonaws.com 1
>>>>>>> 2014-07-02 17:01:42,109 INFO org.apache.giraph.zk.ZooKeeperManager: getZooKeeperServerList: For task 1, got file 'zkServerList_ec2-54-186-5-159.us-west-2.compute.amazonaws.com 0 ' (polling period is 3000)
>>>>>>> 2014-07-02 17:01:42,313 INFO org.apache.giraph.zk.ZooKeeperManager: getZooKeeperServerList: Found [ec2-54-186-5-159.us-west-2.compute.amazonaws.com, 0] 2 hosts in filename 'zkServerList_ec2-54-186-5-159.us-west-2.compute.amazonaws.com 0 '
>>>>>>> 2014-07-02 17:01:42,316 INFO org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers: Got [ec2-54-186-5-159.us-west-2.compute.amazonaws.com] 1 hosts from 1 ready servers when 1 required (polling period is 3000) on attempt 0
>>>>>>> 2014-07-02 17:01:42,316 INFO org.apache.giraph.graph.GraphTaskManager: setup: Starting up BspServiceWorker...
>>>>>>> 2014-07-02 17:01:42,330 INFO org.apache.giraph.bsp.BspService: BspService: Connecting to ZooKeeper with job job_201407021636_0006, 1 on ec2-54-186-5-159.us-west-2.compute.amazonaws.com:22181
>>>>>>> 2014-07-02 17:01:42,337 INFO org.apache.zookeeper.ZooKeeper: Client environment:zookeeper.version=3.3.3-1073969, built on 02/23/2011 22:27 GMT
>>>>>>> 2014-07-02 17:01:42,337 INFO org.apache.zookeeper.ZooKeeper: Client environment:host.name=ec2-54-186-5-159.us-west-2.compute.amazonaws.com
>>>>>>> 2014-07-02 17:01:42,337 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.version=1.7.0_60
>>>>>>> 2014-07-02 17:01:42,337 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.vendor=Oracle Corporation
>>>>>>> 2014-07-02 17:01:42,337 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.home=/usr/lib/jvm/java-7-oracle/jre
>>>>>>> 2014-07-02 17:01:42,338 INFO org.apache.zookeeper.ZooKeeper: Client
>>>>>>>  environment:java.class.path=/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/jars/classes:/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/jars:/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/attempt_201407021636_0006_m_000001_0/work:/home/ubuntu/hadoop/bin/../conf:/usr/lib/jvm/java-7-oracle/lib/tools.jar:/home/ubuntu/hadoop/bin/..:/home/ubuntu/hadoop/bin/../hadoop-core-0.20.203.0.jar:/home/ubuntu/hadoop/bin/../lib/aspectjrt-1.6.5.jar:/home/ubuntu/hadoop/bin/../lib/aspectjtools-1.6.5.jar:/home/ubuntu/hadoop/bin/../lib/commons-beanutils-1.7.0.jar:/home/ubuntu/hadoop/bin/../lib/commons-beanutils-core-1.8.0.jar:/home/ubuntu/hadoop/bin/../lib/commons-cli-1.2.jar:/home/ubuntu/hadoop/bin/../lib/commons-codec-1.4.jar:/home/ubuntu/hadoop/bin/../lib/commons-collections-3.2.1.jar:/home/ubuntu/hadoop/bin/../lib/commons-configuration-1.6.jar:/home/ubuntu/hadoo
>>>>>>> p/bin/../lib/commons-daemon-1.0.1.jar:/home/ubuntu/hadoop/bin/../lib/commons-digester-1.8.jar:/home/ubuntu/hadoop/bin/../lib/commons-el-1.0.jar:/home/ubuntu/hadoop/bin/../lib/commons-httpclient-3.0.1.jar:/home/ubuntu/hadoop/bin/../lib/commons-lang-2.4.jar:/home/ubuntu/hadoop/bin/../lib/commons-logging-1.1.1.jar:/home/ubuntu/hadoop/bin/../lib/commons-logging-api-1.0.4.jar:/home/ubuntu/hadoop/bin/../lib/commons-math-2.1.jar:/home/ubuntu/hadoop/bin/../lib/commons-net-1.4.1.jar:/home/ubuntu/hadoop/bin/../lib/core-3.1.1.jar:/home/ubuntu/hadoop/bin/../lib/hsqldb-1.8.0.10.jar:/home/ubuntu/hadoop/bin/../lib/jackson-core-asl-1.0.1.jar:/home/ubuntu/hadoop/bin/../lib/jackson-mapper-asl-1.0.1.jar:/home/ubuntu/hadoop/bin/../lib/jasper-compiler-5.5.12.jar:/home/ubuntu/hadoop/bin/../lib/jasper-runtime-5.5.12.jar:/home/ubuntu/hadoop/bin/../lib/jets3t-0.6.1.jar:/home/ubuntu/hadoop/bin/../lib/jetty-6.1.26.jar:/home/ubuntu/hadoop/bin/../lib/jetty-util-6.1.26.jar:/home/ubu
>>>>>>> ntu/hadoop/bin/../lib/jsch-0.1.42.jar:/home/ubuntu/hadoop/bin/../lib/junit-4.5.jar:/home/ubuntu/hadoop/bin/../lib/kfs-0.2.2.jar:/home/ubuntu/hadoop/bin/../lib/log4j-1.2.15.jar:/home/ubuntu/hadoop/bin/../lib/mockito-all-1.8.5.jar:/home/ubuntu/hadoop/bin/../lib/oro-2.0.8.jar:/home/ubuntu/hadoop/bin/../lib/servlet-api-2.5-20081211.jar:/home/ubuntu/hadoop/bin/../lib/slf4j-api-1.4.3.jar:/home/ubuntu/hadoop/bin/../lib/slf4j-log4j12-1.4.3.jar:/home/ubuntu/hadoop/bin/../lib/xmlenc-0.52.jar:/home/ubuntu/hadoop/bin/../lib/jsp-2.1/jsp-2.1.jar:/home/ubuntu/hadoop/bin/../lib/jsp-2.1/jsp-api-2.1.jar
>>>>>>> 2014-07-02 17:01:42,338 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.library.path=/home/ubuntu/hadoop/bin/../lib/native/Linux-amd64-64:/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/attempt_201407021636_0006_m_000001_0/work
>>>>>>> 2014-07-02 17:01:42,338 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/attempt_201407021636_0006_m_000001_0/work/tmp
>>>>>>> 2014-07-02 17:01:42,338 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.compiler=<NA>
>>>>>>> 2014-07-02 17:01:42,338 INFO org.apache.zookeeper.ZooKeeper: Client environment:os.name=Linux
>>>>>>> 2014-07-02 17:01:42,338 INFO org.apache.zookeeper.ZooKeeper: Client environment:os.arch=amd64
>>>>>>> 2014-07-02 17:01:42,338 INFO org.apache.zookeeper.ZooKeeper: Client environment:os.version=3.2.0-58-virtual
>>>>>>> 2014-07-02 17:01:42,338 INFO org.apache.zookeeper.ZooKeeper: Client environment:user.name=ubuntu
>>>>>>> 2014-07-02 17:01:42,338 INFO org.apache.zookeeper.ZooKeeper: Client environment:user.home=/home/ubuntu
>>>>>>> 2014-07-02 17:01:42,338 INFO org.apache.zookeeper.ZooKeeper: Client environment:user.dir=/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/attempt_201407021636_0006_m_000001_0/work
>>>>>>> 2014-07-02 17:01:42,339 INFO org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=ec2-54-186-5-159.us-west-2.compute.amazonaws.com:22181 sessionTimeout=60000 watcher=org.apache.giraph.worker.BspServiceWorker@54a5733f
>>>>>>> 2014-07-02 17:01:42,347 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server ec2-54-186-5-159.us-west-2.compute.amazonaws.com/172.31.45.24:22181
>>>>>>> 2014-07-02 <http://ec2-54-186-5-159.us-west-2.compute.amazonaws.com/172.31.45.24:221812014-07-02> 17:01:42,347 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to ec2-54-186-5-159.us-west-2.compute.amazonaws.com/172.31.45.24:22181, initiating session
>>>>>>> 2014-07-02 17:01:42,812 INFO org.apache.zookeeper.ClientCnxn: Session establishment complete on server ec2-54-186-5-159.us-west-2.compute.amazonaws.com/172.31.45.24:22181, sessionid = 0x146f806378d0001, negotiated timeout = 600000
>>>>>>> 2014-07-02 17:01:42,814 INFO org.apache.giraph.bsp.BspService: process: Asynchronous connection complete.
>>>>>>> 2014-07-02 17:01:42,819 INFO org.apache.giraph.comm.netty.NettyWorkerServer: createMessageStoreFactory: Using ByteArrayMessagesPerVertexStore since there is no combiner
>>>>>>> 2014-07-02 17:01:42,881 INFO org.apache.giraph.comm.netty.NettyServer: NettyServer: Using execution handler with 8 threads after requestFrameDecoder.
>>>>>>> 2014-07-02 17:01:42,904 INFO org.apache.giraph.comm.netty.NettyServer: start: Started server communication server: ec2-54-186-5-159.us-west-2.compute.amazonaws.com/172.31.45.24:30001 with up to 16 threads on bind attempt 0 with sendBufferSize = 32768 receiveBufferSize = 524288 backlog = 1
>>>>>>> 2014-07-02 17:01:42,907 INFO org.apache.giraph.comm.netty.NettyClient: NettyClient: Using execution handler with 8 threads after requestEncoder.
>>>>>>> 2014-07-02 17:01:42,914 INFO org.apache.giraph.graph.GraphTaskManager: setup: Registering health of this worker...
>>>>>>> 2014-07-02 17:01:45,049 INFO org.apache.giraph.bsp.BspService: process: applicationAttemptChanged signaled
>>>>>>> 2014-07-02 17:01:45,068 WARN org.apache.giraph.bsp.BspService: process: Unknown and unprocessed event (path=/_hadoopBsp/job_201407021636_0006/_applicationAttemptsDir/0/_superstepDir, type=NodeChildrenChanged, state=SyncConnected)
>>>>>>> 2014-07-02 17:01:45,076 INFO org.apache.giraph.worker.BspServiceWorker: registerHealth: Created my health node for attempt=0, superstep=-1 with /_hadoopBsp/job_201407021636_0006/_applicationAttemptsDir/0/_superstepDir/-1/_workerHealthyDir/ec2-54-186-5-159.us-west-2.compute.amazonaws.com_1 and workerInfo= Worker(hostname=ec2-54-186-5-159.us-west-2.compute.amazonaws.com, MRtaskID=1, port=30001)
>>>>>>> 2014-07-02 17:01:45,183 INFO org.apache.giraph.comm.netty.NettyServer: start: Using Netty without authentication.
>>>>>>> 2014-07-02 17:01:45,339 INFO org.apache.giraph.bsp.BspService: process: partitionAssignmentsReadyChanged (partitions are assigned)
>>>>>>> 2014-07-02 17:01:45,345 INFO org.apache.giraph.worker.BspServiceWorker: startSuperstep: Master(hostname=ec2-54-186-5-159.us-west-2.compute.amazonaws.com, MRtaskID=0, port=30000)
>>>>>>> 2014-07-02 17:01:45,345 INFO org.apache.giraph.worker.BspServiceWorker: startSuperstep: Ready for computation on superstep -1 since worker selection and vertex range assignments are done in /_hadoopBsp/job_201407021636_0006/_applicationAttemptsDir/0/_superstepDir/-1/_addressesAndPartitions
>>>>>>> 2014-07-02 17:01:45,346 INFO org.apache.giraph.comm.netty.NettyClient: Using Netty without authentication.
>>>>>>> 2014-07-02 17:01:45,354 INFO org.apache.giraph.comm.netty.NettyClient: connectAllAddresses: Successfully added 1 connections, (1 total connected) 0 failed, 0 failures total.
>>>>>>> 2014-07-02 17:01:45,359 INFO org.apache.giraph.worker.BspServiceWorker: loadInputSplits: Using 1 thread(s), originally 1 threads(s) for 2 total splits.
>>>>>>> 2014-07-02 17:01:45,362 INFO org.apache.giraph.comm.SendPartitionCache: SendPartitionCache: maxVerticesPerTransfer = 10000
>>>>>>> 2014-07-02 17:01:45,362 INFO org.apache.giraph.comm.SendPartitionCache: SendPartitionCache: maxEdgesPerTransfer = 80000
>>>>>>> 2014-07-02 17:01:45,372 INFO org.apache.giraph.worker.InputSplitsHandler: reserveInputSplit: Reserved input split path /_hadoopBsp/job_201407021636_0006/_vertexInputSplitDir/0, overall roughly 0.0% input splits reserved
>>>>>>> 2014-07-02 17:01:45,373 INFO org.apache.giraph.worker.InputSplitsCallable: getInputSplit: Reserved /_hadoopBsp/job_201407021636_0006/_vertexInputSplitDir/0 from ZooKeeper and got input split 'hdfs://ec2-54-186-5-159.us-west-2.compute.amazonaws.com:54310/user/ubuntu/input/CA.txt:0+67108864'
>>>>>>> 2014-07-02 17:01:46,615 INFO org.apache.giraph.worker.VertexInputSplitsCallable: readVertexInputSplit: Loaded 250000 vertices at 200412.01085983627 vertices/sec 697447 edges at 560085.1663971642 edges/sec Memory (free/total/max) = 112.94M / 182.50M / 182.50M
>>>>>>> 2014-07-02 17:01:47,440 INFO org.apache.giraph.worker.VertexInputSplitsCallable: readVertexInputSplit: Loaded 500000 vertices at 241131.36490688394 vertices/sec 1419367 edges at 685221.3493060106 edges/sec Memory (free/total/max) = 45.07M / 187.50M / 187.50M
>>>>>>> 2014-07-02 17:01:51,322 INFO org.apache.giraph.worker.VertexInputSplitsCallable: readVertexInputSplit: Loaded 750000 vertices at 125921.72750649283 vertices/sec 2149814 edges at 361077.323111284 edges/sec Memory (free/total/max) = 16.73M / 189.50M / 189.50M
>>>>>>> 2014-07-02 17:01:55,205 ERROR org.apache.giraph.worker.BspServiceWorker: unregisterHealth: Got failure, unregistering health on /_hadoopBsp/job_201407021636_0006/_applicationAttemptsDir/0/_superstepDir/-1/_workerHealthyDir/ec2-54-186-5-159.us-west-2.compute.amazonaws.com_1 on superstep -1
>>>>>>> 2014-07-02 17:01:55,213 INFO org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
>>>>>>> 2014-07-02 17:01:55,716 INFO org.apache.hadoop.io.nativeio.NativeIO: Initialized cache for UID to User mapping with a cache timeout of 14400 seconds.
>>>>>>> 2014-07-02 17:01:55,717 INFO org.apache.hadoop.io.nativeio.NativeIO: Got UserName ubuntu for UID 1000 from the native implementation
>>>>>>> 2014-07-02 17:01:55,718 WARN org.apache.hadoop.mapred.Child: Error running child
>>>>>>> java.lang.IllegalStateException: run: Caught an unrecoverable exception waitFor: ExecutionException occurred while waiting for org.apache.giraph.utils.ProgressableUtils$FutureWaitable@36df8f5
>>>>>>> 	at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:102)
>>>>>>> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
>>>>>>> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
>>>>>>> 	at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
>>>>>>> 	at java.security.AccessController.doPrivileged(Native Method)
>>>>>>> 	at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>>>> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
>>>>>>> 	at org.apache.hadoop.mapred.Child.main(Child.java:253)
>>>>>>> Caused by: java.lang.IllegalStateException: waitFor: ExecutionException occurred while waiting for org.apache.giraph.utils.ProgressableUtils$FutureWaitable@36df8f5
>>>>>>> 	at org.apache.giraph.utils.ProgressableUtils.waitFor(ProgressableUtils.java:151)
>>>>>>> 	at org.apache.giraph.utils.ProgressableUtils.waitForever(ProgressableUtils.java:111)
>>>>>>> 	at org.apache.giraph.utils.ProgressableUtils.getFutureResult(ProgressableUtils.java:73)
>>>>>>> 	at org.apache.giraph.utils.ProgressableUtils.getResultsWithNCallables(ProgressableUtils.java:192)
>>>>>>> 	at org.apache.giraph.worker.BspServiceWorker.loadInputSplits(BspServiceWorker.java:276)
>>>>>>> 	at org.apache.giraph.worker.BspServiceWorker.loadVertices(BspServiceWorker.java:323)
>>>>>>> 	at org.apache.giraph.worker.BspServiceWorker.setup(BspServiceWorker.java:506)
>>>>>>> 	at org.apache.giraph.graph.GraphTaskManager.execute(GraphTaskManager.java:230)
>>>>>>> 	at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:92)
>>>>>>> 	... 7 more
>>>>>>> Caused by: java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: Java heap space
>>>>>>> 	at java.util.concurrent.FutureTask.report(FutureTask.java:122)
>>>>>>> 	at java.util.concurrent.FutureTask.get(FutureTask.java:202)
>>>>>>> 	at org.apache.giraph.utils.ProgressableUtils$FutureWaitable.waitFor(ProgressableUtils.java:271)
>>>>>>> 	at org.apache.giraph.utils.ProgressableUtils.waitFor(ProgressableUtils.java:143)
>>>>>>> 	... 15 more
>>>>>>> Caused by: java.lang.OutOfMemoryError: Java heap space
>>>>>>> 	at java.util.concurrent.ConcurrentHashMap$Segment.rehash(ConcurrentHashMap.java:501)
>>>>>>> 	at java.util.concurrent.ConcurrentHashMap$Segment.put(ConcurrentHashMap.java:460)
>>>>>>> 	at java.util.concurrent.ConcurrentHashMap.put(ConcurrentHashMap.java:1130)
>>>>>>> 	at org.apache.giraph.partition.SimplePartition.addPartition(SimplePartition.java:87)
>>>>>>> 	at org.apache.giraph.partition.SimplePartitionStore.addPartition(SimplePartitionStore.java:71)
>>>>>>> 	at org.apache.giraph.comm.requests.SendVertexRequest.doRequest(SendVertexRequest.java:81)
>>>>>>> 	at org.apache.giraph.comm.netty.NettyWorkerClientRequestProcessor.doRequest(NettyWorkerClientRequestProcessor.java:470)
>>>>>>> 	at org.apache.giraph.comm.netty.NettyWorkerClientRequestProcessor.sendPartitionRequest(NettyWorkerClientRequestProcessor.java:203)
>>>>>>> 	at org.apache.giraph.comm.netty.NettyWorkerClientRequestProcessor.sendVertexRequest(NettyWorkerClientRequestProcessor.java:267)
>>>>>>> 	at org.apache.giraph.worker.VertexInputSplitsCallable.readInputSplit(VertexInputSplitsCallable.java:140)
>>>>>>> 	at org.apache.giraph.worker.InputSplitsCallable.loadInputSplit(InputSplitsCallable.java:220)
>>>>>>> 	at org.apache.giraph.worker.InputSplitsCallable.call(InputSplitsCallable.java:161)
>>>>>>> 	at org.apache.giraph.worker.InputSplitsCallable.call(InputSplitsCallable.java:58)
>>>>>>> 	at org.apache.giraph.utils.LogStacktraceCallable.call(LogStacktraceCallable.java:51)
>>>>>>> 	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>>>>>> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>>>> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>>>>> 	at java.lang.Thread.run(Thread.java:745)
>>>>>>> 2014-07-02 17:01:55,722 INFO org.apache.hadoop.mapred.Task: Runnning cleanup for the task
>>>>>>>
>>>>>>>
>>>>>>> ===========================================================================
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> I've tried increasing Java Heap Space in hadoop/conf/mapred-site.xml
>>>>>>> by adding this:
>>>>>>>   <property>
>>>>>>>     <name>mapred.child.java.opts</name>
>>>>>>>     <value>-Xmx1024m</value>
>>>>>>>   </property>
>>>>>>>
>>>>>>> But that just caused the entire job to fail from the start.
>>>>>>>
>>>>>>> Before using this version of Giraph, I used 1.0.0 and 1.1.0-RC0 and
>>>>>>> those versions provide me with more and different errors to debug that
>>>>>>> relate to problems with Hadoop itself.  So the Giraph version I'm currently
>>>>>>> using seems to be the best for me because these errors seem more manageable.
>>>>>>>
>>>>>>> What can I do to fix this error?  I thought Giraph was built for
>>>>>>> large scale graph processing so I suppose this problem was encountered
>>>>>>> before by someone testing large graphs.  I searched through the mailing
>>>>>>> archives and couldn't find anything though.
>>>>>>>
>>>>>>> I can provide more information if you need it.  Thanks a lot.
>>>>>>>
>>>>>>> Bryan Rowe
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> *Then Young Han replied:*
>>>>>>>
>>>>>>> 10 minutes seems way too long to load in 91mb from HDFS. Are you
>>>>>>> sure your graph's format is correct? For the Json input formats, each line
>>>>>>> of your file should be:
>>>>>>>
>>>>>>> [vertex id, vertex value, [[dst id, edge weight], [dst id, edge
>>>>>>> weigth], ..., [dst id, edge weight]]]
>>>>>>>
>>>>>>> You can set the vertex value for every vertex to 0 (SSSP will
>>>>>>> overwrite that value in the first superstep). If your graph doesn't have
>>>>>>> edge weights, you can just set them to 1.
>>>>>>>
>>>>>>> Also, have you tried a larger Xmx value? E.g., 4096m or 8192m.
>>>>>>>
>>>>>>> Young
>>>>>>>
>>>>>>>
>>>>>>> *Then I replied:*
>>>>>>> Hi Young,
>>>>>>>
>>>>>>> I believe my graph has the correct format.
>>>>>>> Here is the first 40 lines of the graph I'm using:
>>>>>>> [0,0,[[1,1],[4,1],[5,1]]]
>>>>>>> [1,0,[[0,1],[2,1],[3,1]]]
>>>>>>> [2,0,[[1,1],[6,1],[7,1]]]
>>>>>>> [3,0,[[8,1],[1,1],[9,1]]]
>>>>>>> [4,0,[[0,1],[10,1],[11,1]]]
>>>>>>> [5,0,[[0,1]]]
>>>>>>> [6,0,[[2,1],[12,1]]]
>>>>>>> [7,0,[[2,1],[12,1],[13,1]]]
>>>>>>> [8,0,[[11,1],[3,1],[60,1]]]
>>>>>>> [9,0,[[3,1]]]
>>>>>>> [10,0,[[35,1],[4,1],[38,1]]]
>>>>>>> [11,0,[[8,1],[59,1],[4,1]]]
>>>>>>> [12,0,[[41,1],[6,1],[7,1]]]
>>>>>>> [13,0,[[89,1],[90,1],[91,1],[7,1]]]
>>>>>>> [14,0,[[18,1],[19,1],[15,1]]]
>>>>>>> [15,0,[[16,1],[17,1],[14,1]]]
>>>>>>> [16,0,[[20,1],[21,1],[22,1],[15,1]]]
>>>>>>> [17,0,[[24,1],[23,1],[21,1],[15,1]]]
>>>>>>> [18,0,[[24,1],[14,1]]]
>>>>>>> [19,0,[[25,1],[22,1],[14,1]]]
>>>>>>> [20,0,[[16,1],[25,1],[26,1]]]
>>>>>>> [21,0,[[16,1],[17,1],[30,1]]]
>>>>>>> [22,0,[[16,1],[19,1]]]
>>>>>>> [23,0,[[17,1],[105,1]]]
>>>>>>> [24,0,[[17,1],[18,1],[58,1]]]
>>>>>>> [25,0,[[27,1],[19,1],[20,1]]]
>>>>>>> [26,0,[[28,1],[27,1],[20,1]]]
>>>>>>> [27,0,[[25,1],[26,1],[29,1]]]
>>>>>>> [28,0,[[26,1],[29,1],[30,1]]]
>>>>>>> [29,0,[[27,1],[28,1],[31,1]]]
>>>>>>> [30,0,[[32,1],[33,1],[28,1],[21,1]]]
>>>>>>> [31,0,[[34,1],[29,1]]]
>>>>>>> [32,0,[[105,1],[30,1],[39,1]]]
>>>>>>> [33,0,[[30,1]]]
>>>>>>> [34,0,[[38,1],[31,1]]]
>>>>>>> [35,0,[[10,1],[36,1],[37,1]]]
>>>>>>> [36,0,[[40,1],[35,1],[39,1]]]
>>>>>>> [37,0,[[41,1],[35,1]]]
>>>>>>> [38,0,[[10,1],[34,1]]]
>>>>>>> [39,0,[[32,1],[58,1],[36,1],[119,1]]]
>>>>>>> [40,0,[[90,1],[36,1]]]
>>>>>>>
>>>>>>> Also, sorry about sending the email twice.  My email client messed
>>>>>>> up.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Bryan
>>>>>>>
>>>>>>> I've tried a larger graph.  It ran for 2 hours then failed with
>>>>>>> different error messages I believe.
>>>>>>>
>>>>>>> Bryan
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message