incubator-giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yingyi Bu <buyin...@gmail.com>
Subject Re: PageRank OOM Exception
Date Fri, 18 Nov 2011 07:24:46 GMT
Hi Avery,

    Thanks a lot for your help!!
    I use your settings, and get rid of OOM now!   However, after running
the job for 10 minutes, one worker failed, and then for a while, all
mappers failed.  Attached below are mapper logs from two nodes.  It seems
they cannot connect to the Zookeeper.  The workers run well until the
highlighted exception.  Do I miss something in the job setting?
    Thanks, again!!

Best regards,
Yingyi



Mapper log on Node-1:
 2011-11-17 22:56:39,044 INFO org.apache.giraph.zk.ZooKeeperManager:
getZooKeeperServerList: For task 0, got file 'zkServerList_asterix-010 0 '
(polling period is 3000)
2011-11-17 22:56:39,044 INFO org.apache.giraph.zk.ZooKeeperManager:
getZooKeeperServerList: Found [asterix-010, 0] 2 hosts in filename
'zkServerList_asterix-010 0 '
2011-11-17 22:56:39,046 INFO org.apache.giraph.zk.ZooKeeperManager:
onlineZooKeeperServers: Trying to delete old directory
/mnt/data/sda/space/yingyi/hdfsdata_giraph/mapred/local/taskTracker/yingyib/jobcache/job_201111172247_0003/work/_bspZooKeeper
2011-11-17 22:56:39,049 INFO org.apache.giraph.zk.ZooKeeperManager:
generateZooKeeperConfigFile: Creating file
/mnt/data/sda/space/yingyi/hdfsdata_giraph/mapred/local/taskTracker/yingyib/jobcache/job_201111172247_0003/work/_bspZooKeeper/zoo.cfg
in
/mnt/data/sda/space/yingyi/hdfsdata_giraph/mapred/local/taskTracker/yingyib/jobcache/job_201111172247_0003/work/_bspZooKeeper
with base port 22181
2011-11-17 22:56:39,049 INFO org.apache.giraph.zk.ZooKeeperManager:
generateZooKeeperConfigFile: Make directory of _bspZooKeeper = true
2011-11-17 22:56:39,049 INFO org.apache.giraph.zk.ZooKeeperManager:
generateZooKeeperConfigFile: Delete of zoo.cfg = false
2011-11-17 22:56:39,050 INFO org.apache.giraph.zk.ZooKeeperManager:
onlineZooKeeperServers: Attempting to start ZooKeeper server with command
[/mnt/data/sda/space/yingyi/tools/java/jre/bin/java, -Xmx256m,
-XX:ParallelGCThreads=4, -XX:+UseConcMarkSweepGC,
-XX:CMSInitiatingOccupancyFraction=70, -XX:MaxGCPauseMillis=100, -cp,
/mnt/data/sda/space/yingyi/hdfsdata_giraph/mapred/local/taskTracker/yingyib/jobcache/job_201111172247_0003/jars/job.jar,
org.apache.zookeeper.server.quorum.QuorumPeerMain,
/mnt/data/sda/space/yingyi/hdfsdata_giraph/mapred/local/taskTracker/yingyib/jobcache/job_201111172247_0003/work/_bspZooKeeper/zoo.cfg]
in directory
/mnt/data/sda/space/yingyi/hdfsdata_giraph/mapred/local/taskTracker/yingyib/jobcache/job_201111172247_0003/work/_bspZooKeeper
2011-11-17 22:56:39,056 INFO org.apache.giraph.zk.ZooKeeperManager:
onlineZooKeeperServers: Connect attempt 0 of 10 max trying to connect to
asterix-010:22181 with poll msecs = 3000
2011-11-17 22:56:39,058 WARN org.apache.giraph.zk.ZooKeeperManager:
onlineZooKeeperServers: Got ConnectException
java.net.ConnectException: Connection refused
        at java.net.PlainSocketImpl.socketConnect(Native Method)
        at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333)
        at
java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195)
        at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182)
        at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
        at java.net.Socket.connect(Socket.java:529)
        at
org.apache.giraph.zk.ZooKeeperManager.onlineZooKeeperServers(ZooKeeperManager.java:612)
        at org.apache.giraph.graph.GraphMapper.setup(GraphMapper.java:401)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
        at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
        at org.apache.hadoop.mapred.Child.main(Child.java:249)
2011-11-17 22:56:42,062 INFO org.apache.giraph.zk.ZooKeeperManager:
onlineZooKeeperServers: Connect attempt 1 of 10 max trying to connect to
asterix-010:22181 with poll msecs = 3000
2011-11-17 22:56:42,063 INFO org.apache.giraph.zk.ZooKeeperManager:
onlineZooKeeperServers: Connected!
2011-11-17 22:56:42,064 INFO org.apache.giraph.zk.ZooKeeperManager:
onlineZooKeeperServers: Creating my filestamp
_bsp/_defaultZkManagerDir/job_201111172247_0003/_zkServer/asterix-010 0
2011-11-17 22:56:42,070 INFO org.apache.giraph.graph.GraphMapper: setup:
Starting up BspServiceMaster (master thread)...
2011-11-17 22:56:42,080 INFO org.apache.giraph.graph.BspService:
BspService: Connecting to ZooKeeper with job job_201111172247_0003, 0 on
asterix-010:22181
2011-11-17 22:56:42,086 INFO org.apache.zookeeper.ZooKeeper: Client
environment:zookeeper.version=3.3.1-942149, built on 05/07/2010 17:14 GMT
2011-11-17 22:56:42,086 INFO org.apache.zookeeper.ZooKeeper: Client
environment:host.name=asterix-010
2011-11-17 22:56:42,086 INFO org.apache.zookeeper.ZooKeeper: Client
environment:java.version=1.6.0_21
2011-11-17 22:56:42,086 INFO org.apache.zookeeper.ZooKeeper: Client
environment:java.vendor=Sun Microsystems Inc.
2011-11-17 22:56:42,086 INFO org.apache.zookeeper.ZooKeeper: Client
environment:java.home=/mnt/data/sda/space/yingyi/tools/java/jre
2011-11-17 22:56:42,086 INFO org.apache.zookeeper.ZooKeeper: Client
environment:java.class.path=/mnt/data/sda/space/yingyi/hdfsdata_giraph/mapred/local/taskTracker/yingyib/jobcache/job_201111172247_0003/jars/classes:/mnt/data/sda/space/yingyi/hdfsdata_giraph/mapred/local/taskTracker/yingyib/jobcache/job_201111172247_0003/jars:/mnt/data/sda/space/yingyi/hdfsdata_giraph/mapred/local/taskTracker/yingyib/jobcache/job_201111172247_0003/attempt_201111172247_0003_m_000000_0/work:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../conf:/mnt/data/sda/space/yingyi/tools/java/lib/tools.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../build/classes:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../build:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../build/test/classes:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../build/tools:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/hadoop-core-0.20.205.0.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/asm-3.2.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/aspectjrt-1.6.5.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/aspectjtools-1.6.5.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-beanutils-1.7.0.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-beanutils-core-1.8.0.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-cli-1.2.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-codec-1.4.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-collections-3.2.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-configuration-1.6.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-daemon-1.0.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-digester-1.8.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-el-1.0.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-httpclient-3.0.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-lang-2.4.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-logging-1.1.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-logging-api-1.0.4.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-math-2.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-net-1.4.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/core-3.1.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/hadoop-capacity-scheduler-0.20.205.0.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/hadoop-fairscheduler-0.20.205.0.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/hadoop-thriftfs-0.20.205.0.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/hsqldb-1.8.0.10.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jackson-core-asl-1.0.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jackson-mapper-asl-1.0.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jasper-compiler-5.5.12.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jasper-runtime-5.5.12.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jdeb-0.8.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jersey-core-1.8.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jersey-json-1.8.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jersey-server-1.8.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jets3t-0.6.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jetty-6.1.26.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jetty-util-6.1.26.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jsch-0.1.42.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/junit-4.5.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/kfs-0.2.2.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/log4j-1.2.15.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/mockito-all-1.8.5.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/oro-2.0.8.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/servlet-api-2.5-20081211.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/slf4j-api-1.4.3.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/slf4j-log4j12-1.4.3.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/xmlenc-0.52.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jsp-2.1/jsp-2.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jsp-2.1/jsp-api-2.1.jar
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
        at org.apache.hadoop.mapred.Child.main(Child.java:249)
2011-11-17 22:56:42,062 INFO org.apache.giraph.zk.ZooKeeperManager:
onlineZooKeeperServers: Connect attempt 1 of 10 max trying to connect to
asterix-010:22181 with poll msecs = 3000
2011-11-17 22:56:42,063 INFO org.apache.giraph.zk.ZooKeeperManager:
onlineZooKeeperServers: Connected!
2011-11-17 22:56:42,064 INFO org.apache.giraph.zk.ZooKeeperManager:
onlineZooKeeperServers: Creating my filestamp
_bsp/_defaultZkManagerDir/job_201111172247_0003/_zkServer/asterix-010 0
2011-11-17 22:56:42,070 INFO org.apache.giraph.graph.GraphMapper: setup:
Starting up BspServiceMaster (master thread)...
2011-11-17 22:56:42,080 INFO org.apache.giraph.graph.BspService:
BspService: Connecting to ZooKeeper with job job_201111172247_0003, 0 on
asterix-010:22181
2011-11-17 22:56:42,086 INFO org.apache.zookeeper.ZooKeeper: Client
environment:zookeeper.version=3.3.1-942149, built on 05/07/2010 17:14 GMT
2011-11-17 22:56:42,086 INFO org.apache.zookeeper.ZooKeeper: Client
environment:host.name=asterix-010
2011-11-17 22:56:42,086 INFO org.apache.zookeeper.ZooKeeper: Client
environment:java.version=1.6.0_21
2011-11-17 22:56:42,086 INFO org.apache.zookeeper.ZooKeeper: Client
environment:java.vendor=Sun Microsystems Inc.
2011-11-17 22:56:42,086 INFO org.apache.zookeeper.ZooKeeper: Client
environment:java.home=/mnt/data/sda/space/yingyi/tools/java/jre
2011-11-17 22:56:42,086 INFO org.apache.zookeeper.ZooKeeper: Client
environment:java.class.path=/mnt/data/sda/space/yingyi/hdfsdata_giraph/mapred/local/taskTracker/yingyib/jobcache/job_201111172247_0003/jars/classes:/mnt/data/sda/space/yingyi/hdfsdata_giraph/mapred/local/taskTracker/yingyib/jobcache/job_201111172247_0003/jars:/mnt/data/sda/space/yingyi/hdfsdata_giraph/mapred/local/taskTracker/yingyib/jobcache/job_201111172247_0003/attempt_201111172247_0003_m_000000_0/work:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../conf:/mnt/data/sda/space/yingyi/tools/java/lib/tools.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../build/classes:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../build:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../build/test/classes:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../build/tools:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/hadoop-core-0.20.205.0.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/asm-3.2.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/aspectjrt-1.6.5.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/aspectjtools-1.6.5.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-beanutils-1.7.0.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-beanutils-core-1.8.0.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-cli-1.2.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-codec-1.4.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-collections-3.2.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-configuration-1.6.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-daemon-1.0.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-digester-1.8.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-el-1.0.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-httpclient-3.0.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-lang-2.4.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-logging-1.1.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-logging-api-1.0.4.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-math-2.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-net-1.4.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/core-3.1.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/hadoop-capacity-scheduler-0.20.205.0.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/hadoop-fairscheduler-0.20.205.0.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/hadoop-thriftfs-0.20.205.0.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/hsqldb-1.8.0.10.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jackson-core-asl-1.0.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jackson-mapper-asl-1.0.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jasper-compiler-5.5.12.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jasper-runtime-5.5.12.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jdeb-0.8.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jersey-core-1.8.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jersey-json-1.8.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jersey-server-1.8.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jets3t-0.6.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jetty-6.1.26.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jetty-util-6.1.26.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jsch-0.1.42.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/junit-4.5.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/kfs-0.2.2.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/log4j-1.2.15.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/mockito-all-1.8.5.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/oro-2.0.8.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/servlet-api-2.5-20081211.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/slf4j-api-1.4.3.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/slf4j-log4j12-1.4.3.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/xmlenc-0.52.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jsp-2.1/jsp-2.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jsp-2.1/jsp-api-2.1.jar2011-11-17
22:56:42,087 INFO org.apache.zookeeper.ZooKeeper: Client
environment:java.library.path=/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../lib:/mnt/data/sda/space/yingyi/hdfsdata_giraph/mapred/local/taskTracker/yingyib/jobcache/job_201111172247_0003/attempt_201111172247_0003_m_000000_0/work2011-11-17
22:56:42,087 INFO org.apache.zookeeper.ZooKeeper: Client
environment:java.io.tmpdir=/mnt/data/sda/space/yingyi/hdfsdata_giraph/mapred/local/taskTracker/yingyib/jobcache/job_201111172247_0003/attempt_201111172247_0003_m_000000_0/work/tmp
2011-11-17 22:56:42,087 INFO org.apache.zookeeper.ZooKeeper: Client
environment:java.compiler=<NA>
2011-11-17 22:56:42,087 INFO org.apache.zookeeper.ZooKeeper: Client
environment:os.name=Linux
2011-11-17 22:56:42,087 INFO org.apache.zookeeper.ZooKeeper: Client
environment:os.arch=amd642011-11-17 22:56:42,087 INFO
org.apache.zookeeper.ZooKeeper: Client
environment:os.version=2.6.18-194.26.1.el52011-11-17 22:56:42,087 INFO
org.apache.zookeeper.ZooKeeper: Client environment:user.name=yingyib
2011-11-17 22:56:42,087 INFO org.apache.zookeeper.ZooKeeper: Client
environment:user.home=/home/yingyib
2011-11-17 22:56:42,087 INFO org.apache.zookeeper.ZooKeeper: Client
environment:user.dir=/mnt/data/sda/space/yingyi/hdfsdata_giraph/mapred/local/taskTracker/yingyib/jobcache/job_201111172247_0003/attempt_201111172247_0003_m_000000_0/work
2011-11-17 22:56:42,088 INFO org.apache.zookeeper.ZooKeeper: Initiating
client connection, connectString=asterix-010:22181 sessionTimeout=60000
watcher=org.apache.giraph.graph.BspServiceMaster@13a78071
2011-11-17 22:56:42,098 INFO org.apache.zookeeper.ClientCnxn: Opening
socket connection to server asterix-010/10.0.0.10:22181
2011-11-17 22:56:42,099 INFO org.apache.zookeeper.ClientCnxn: Socket
connection established to asterix-010/10.0.0.10:22181, initiating session
2011-11-17 22:56:42,123 INFO org.apache.zookeeper.ClientCnxn: Session
establishment complete on server asterix-010/10.0.0.10:22181, sessionid =
0x133b57675b60000, negotiated timeout = 60000
2011-11-17 22:56:42,125 INFO org.apache.giraph.graph.BspService: process:
Asynchronous connection complete.
2011-11-17 22:56:42,126 INFO org.apache.giraph.graph.GraphMapper: map: No
need to do anything when not a worker
2011-11-17 22:56:42,126 INFO org.apache.giraph.graph.GraphMapper: cleanup:
Starting for MASTER_ZOOKEEPER_ONLY2011-11-17 22:56:42,197 INFO
org.apache.giraph.graph.BspServiceMaster: becomeMaster: First child is
'/_hadoopBsp/job_201111172247_0003/_masterElectionDir/asterix-010_00000000000'
and my bid is
'/_hadoopBsp/job_201111172247_0003/_masterElectionDir/asterix-010_00000000000'
2011-11-17 22:56:42,197 INFO org.apache.giraph.graph.BspServiceMaster:
becomeMaster: I am now the master!
2011-11-17 22:56:42,208 INFO org.apache.giraph.graph.BspService: process:
applicationAttemptChanged signaled
2011-11-17 22:56:42,216 WARN org.apache.giraph.graph.BspService: process:
Unknown and unprocessed event
(path=/_hadoopBsp/job_201111172247_0003/_applicationAttemptsDir/0/_superstepDir,
type=NodeChildrenChanged, state=SyncConnected)
2011-11-17 22:56:45,130 INFO
org.apache.hadoop.mapreduce.lib.input.FileInputFormat: Total input paths to
process : 10
2011-11-17 22:56:45,227 INFO org.apache.giraph.graph.BspServiceMaster:
coordinateSuperstep: 0 out of 10 chosen workers finished on superstep -1
2011-11-17 23:01:20,045 ERROR org.apache.zookeeper.ClientCnxn: Error while
calling watcher
java.lang.RuntimeException:
org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode =
NoNode for
/_hadoopBsp/job_201111172247_0003/_applicationAttemptsDir/0/_superstepDir/-1/_vertexRangeAssignments
        at
org.apache.giraph.graph.BspService.getVertexRangeMap(BspService.java:885)
        at
org.apache.giraph.graph.BspServiceMaster.checkHealthyWorkerFailure(BspServiceMaster.java:1946)
        at
org.apache.giraph.graph.BspServiceMaster.processEvent(BspServiceMaster.java:1976)
        at org.apache.giraph.graph.BspService.process(BspService.java:1095)
        at
org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:488)
Caused by: org.apache.zookeeper.KeeperException$NoNodeException:
KeeperErrorCode = NoNode for
/_hadoopBsp/job_201111172247_0003/_applicationAttemptsDir/0/_superstepDir/-1/_vertexRangeAssignments
        at
org.apache.zookeeper.KeeperException.create(KeeperException.java:102)
        at
org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
        at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:921)
        at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:950)
        at
org.apache.giraph.graph.BspService.getVertexRangeMap(BspService.java:858)
        ... 4 more2011-11-17 23:01:22,009 INFO
org.apache.giraph.graph.BspServiceMaster: coordinateSuperstep: 0 out of 10
chosen workers finished on superstep -12011-11-17 23:11:27,357 WARN
org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers: Forced a
shutdown hook kill of the ZooKeeper process.


Mapper log on Node-2:
2011-11-17 22:56:44,158 INFO org.apache.zookeeper.ZooKeeper: Client
environment:zookeeper.version=3.3.1-942149, built on 05/07/2010 17:14 GMT
2011-11-17 22:56:44,158 INFO org.apache.zookeeper.ZooKeeper: Client
environment:host.name=asterix-001
2011-11-17 22:56:44,158 INFO org.apache.zookeeper.ZooKeeper: Client
environment:java.version=1.6.0_21
2011-11-17 22:56:44,158 INFO org.apache.zookeeper.ZooKeeper: Client
environment:java.vendor=Sun Microsystems Inc.
2011-11-17 22:56:44,158 INFO org.apache.zookeeper.ZooKeeper: Client
environment:java.home=/mnt/data/sda/space/yingyi/tools/java/jre
2011-11-17 22:56:44,158 INFO org.apache.zookeeper.ZooKeeper: Client
environment:java.class.path=/mnt/data/sda/space/yingyi/hdfsdata_giraph/mapred/local/taskTracker/yingyib/jobcache/job_201111172247_0003/jars/classes:/mnt/data/sda/space/yingyi/hdfsdata_giraph/mapred/local/taskTracker/yingyib/jobcache/job_201111172247_0003/jars:/mnt/data/sda/space/yingyi/hdfsdata_giraph/mapred/local/taskTracker/yingyib/jobcache/job_201111172247_0003/attempt_201111172247_0003_m_000008_0/work:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../conf:/mnt/data/sda/space/yingyi/tools/java/lib/tools.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../build/classes:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../build:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../build/test/classes:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../build/tools:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/hadoop-core-0.20.205.0.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/asm-3.2.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/aspectjrt-1.6.5.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/aspectjtools-1.6.5.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-beanutils-1.7.0.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-beanutils-core-1.8.0.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-cli-1.2.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-codec-1.4.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-collections-3.2.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-configuration-1.6.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-daemon-1.0.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-digester-1.8.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-el-1.0.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-httpclient-3.0.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-lang-2.4.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-logging-1.1.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-logging-api-1.0.4.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-math-2.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-net-1.4.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/core-3.1.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/hadoop-capacity-scheduler-0.20.205.0.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/hadoop-fairscheduler-0.20.205.0.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/hadoop-thriftfs-0.20.205.0.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/hsqldb-1.8.0.10.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jackson-core-asl-1.0.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jackson-mapper-asl-1.0.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jasper-compiler-5.5.12.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jasper-runtime-5.5.12.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jdeb-0.8.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jersey-core-1.8.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jersey-json-1.8.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jersey-server-1.8.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jets3t-0.6.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jetty-6.1.26.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jetty-util-6.1.26.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jsch-0.1.42.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/junit-4.5.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/kfs-0.2.2.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/log4j-1.2.15.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/mockito-all-1.8.5.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/oro-2.0.8.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/servlet-api-2.5-20081211.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/slf4j-api-1.4.3.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/slf4j-log4j12-1.4.3.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/xmlenc-0.52.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jsp-2.1/jsp-2.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jsp-2.1/jsp-api-2.1.jar
2011-11-17 22:56:44,158 INFO org.apache.zookeeper.ZooKeeper: Client
environment:java.library.path=/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../lib:/mnt/data/sda/space/yingyi/hdfsdata_giraph/mapred/local/taskTracker/yingyib/jobcache/job_201111172247_0003/attempt_201111172247_0003_m_000008_0/work
2011-11-17 22:56:44,158 INFO org.apache.zookeeper.ZooKeeper: Client
environment:java.io.tmpdir=/mnt/data/sda/space/yingyi/hdfsdata_giraph/mapred/local/taskTracker/yingyib/jobcache/job_201111172247_0003/attempt_201111172247_0003_m_000008_0/work/tmp
2011-11-17 22:56:44,158 INFO org.apache.zookeeper.ZooKeeper: Client
environment:java.compiler=<NA>
2011-11-17 22:56:44,158 INFO org.apache.zookeeper.ZooKeeper: Client
environment:os.name=Linux
2011-11-17 22:56:44,158 INFO org.apache.zookeeper.ZooKeeper: Client
environment:os.arch=amd64
2011-11-17 22:56:44,158 INFO org.apache.zookeeper.ZooKeeper: Client
environment:os.version=2.6.18-194.26.1.el5
2011-11-17 22:56:44,158 INFO org.apache.zookeeper.ZooKeeper: Client
environment:user.name=yingyib
2011-11-17 22:56:44,158 INFO org.apache.zookeeper.ZooKeeper: Client
environment:user.home=/home/yingyib
2011-11-17 22:56:44,158 INFO org.apache.zookeeper.ZooKeeper: Client
environment:user.dir=/mnt/data/sda/space/yingyi/hdfsdata_giraph/mapred/local/taskTracker/yingyib/jobcache/job_201111172247_0003/attempt_201111172247_0003_m_000008_0/work
2011-11-17 22:56:44,159 INFO org.apache.zookeeper.ZooKeeper: Initiating
client connection, connectString=asterix-010:22181 sessionTimeout=60000
watcher=org.apache.giraph.graph.BspServiceWorker@60ded0f0
2011-11-17 22:56:44,171 INFO org.apache.zookeeper.ClientCnxn: Opening
socket connection to server asterix-010/10.0.0.10:22181
2011-11-17 22:56:44,173 INFO org.apache.zookeeper.ClientCnxn: Socket
connection established to asterix-010/10.0.0.10:22181, initiating session
2011-11-17 22:56:44,178 INFO org.apache.zookeeper.ClientCnxn: Session
establishment complete on server asterix-010/10.0.0.10:22181, sessionid =
0x133b57675b60007, negotiated timeout = 60000
2011-11-17 22:56:44,180 INFO org.apache.giraph.graph.BspService: process:
Asynchronous connection complete.
2011-11-17 22:56:44,180 INFO org.apache.giraph.graph.GraphMapper: setup:
Registering health of this worker...
2011-11-17 22:56:44,191 INFO org.apache.giraph.graph.BspService:
getJobState: Job state already exists
(/_hadoopBsp/job_201111172247_0003/_masterJobState)
2011-11-17 22:56:44,195 INFO org.apache.giraph.graph.BspService:
getApplicationAttempt: Node
/_hadoopBsp/job_201111172247_0003/_applicationAttemptsDir already exists!
2011-11-17 22:56:44,198 INFO org.apache.giraph.graph.BspService:
getApplicationAttempt: Node
/_hadoopBsp/job_201111172247_0003/_applicationAttemptsDir already exists!
2011-11-17 22:56:44,204 INFO org.apache.giraph.graph.BspServiceWorker:
registerHealth: Created my health node for attempt=0, superstep=-1 with
/_hadoopBsp/job_201111172247_0003/_applicationAttemptsDir/0/_superstepDir/-1/_workerHealthyDir/asterix-001_8
and hostnamePort = ["asterix-001",30008]
2011-11-17 22:56:45,177 INFO org.apache.giraph.graph.BspService: process:
inputSplitsReadyChanged (input splits ready)
2011-11-17 22:56:45,192 WARN org.apache.giraph.graph.BspService: process:
Unknown and unprocessed event
(path=/_hadoopBsp/job_201111172247_0003/_inputSplitsDir/2/_inputSplitReserved,
type=NodeCreated, state=SyncConnected)
2011-11-17 22:56:45,192 INFO org.apache.giraph.graph.BspServiceWorker:
reserveInputSplit: Reserved input split path
/_hadoopBsp/job_201111172247_0003/_inputSplitsDir/2
2011-11-17 22:56:45,196 INFO org.apache.giraph.graph.BspServiceWorker:
loadVertices: Reserved /_hadoopBsp/job_201111172247_0003/_inputSplitsDir/2
from ZooKeeper and got input split
'hdfs://asterix-master:31888/webmap-tiny-sorted/part-00002:0+834285620'
2011-11-17 23:01:20,608 INFO org.apache.zookeeper.ClientCnxn: Client
session timed out, have not heard from server in 59117ms for sessionid
0x133b57675b60007, closing socket connection and attempting reconnect
2011-11-17 23:02:06,630 ERROR org.apache.zookeeper.ClientCnxn: Error while
calling watcher
java.lang.RuntimeException: process: Disconnected from ZooKeeper, cannot
recover.
        at org.apache.giraph.graph.BspService.process(BspService.java:990)
        at
org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:488)
2011-11-17 23:02:35,793 INFO org.apache.zookeeper.ClientCnxn: Opening
socket connection to server asterix-010/10.0.0.10:22181
2011-11-17 23:02:35,794 INFO org.apache.zookeeper.ClientCnxn: Socket
connection established to asterix-010/10.0.0.10:22181, initiating session
2011-11-17 23:02:35,806 INFO org.apache.zookeeper.ClientCnxn: Unable to
reconnect to ZooKeeper service, session 0x133b57675b60007 has expired,
closing socket connection

On Thu, Nov 17, 2011 at 9:46 PM, Avery Ching <aching@apache.org> wrote:

>  Hi Yingyi,
>
> Here are some ideas you might want to try:
>
> 1)  Limit the thread stack size.
>
> 2  You can set the heap available to the mapper jvm.
>
> I.e. Here's a setting to get 10 GB of heap and use a smaller stack (64k)
> for the threads.
>
> -Dmapred.child.java.opts="-Xms10g -Xmx10g -Xss64k"
>
> Also, you might want to try using the EdgeListVertex instead of Vertex
> (i.e. GiraphJob.setVertexClass(EdgeListVertex.class)), it is quite a bit
> smaller.
>
> Let us know if that helps you.  You should also check to see if your
> Hadoop installation is using a 32-bit of 64-bit JVM.  If it's 32-bit you
> will be limited in how much heap you can use.
>
> Avery
>
>
> On 11/17/11 9:38 PM, Yingyi Bu wrote:
>
> Hi,
>
>     I'm running a Giraph PageRank job.  I tried with 8GB input text data
> over 10 nodes (each has 4 core,  4 disks,  and 12GB physical memory),  that
> is 800MB input-data/machine.    However,  Giraph job fails because of high
> GC costs and Out-of-Memory exception.
>      Do I set some special things in Hadoop configurations, for example,
>  maximum heap size for map task vm ?
>     Thanks!!
>
>  Best regards,
> Yingyi
>
>
>

Mime
View raw message