giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arun Kumar <toga...@gmail.com>
Subject Error while executing large graph
Date Wed, 14 May 2014 10:13:21 GMT
Hi when i run giraph job against a data of 1 gb i am getting the below
exception after some times can somebody tell me what is the issue?
14/05/14 01:54:01 INFO job.JobProgressTracker: Data from 14 workers -
Compute superstep 2: 0 out of 4847571 vertices computed; 0 out of 196
partitions computed; min free memory on worker 6 - 0.81MB, average 11.56MB
14/05/14 01:54:03 INFO zookeeper.ClientCnxn: Unable to read additional data
from server sessionid 0x145f9cff031000f, likely server has closed socket,
closing socket connection and attempting reconnect
14/05/14 01:54:04 INFO zookeeper.ClientCnxn: Opening socket connection to
server mercado-12.hpl.hp.com/15.25.119.147:22181. Will not attempt to
authenticate using SASL (unknown error)
14/05/14 01:54:04 WARN zookeeper.ClientCnxn: Session 0x145f9cff031000f for
server null, unexpected error, closing socket connection and attempting
reconnect
java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
    at
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
    at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
14/05/14 01:54:06 INFO zookeeper.ClientCnxn: Opening socket connection to
server mercado-12.hpl.hp.com/15.25.119.147:22181. Will not attempt to
authenticate using SASL (unknown error)
14/05/14 01:54:06 WARN zookeeper.ClientCnxn: Session 0x145f9cff031000f for
server null, unexpected error, closing socket connection and attempting
reconnect
java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
    at
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
    at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
14/05/14 01:54:06 WARN zk.ZooKeeperExt: exists: Connection loss on attempt
0, waiting 5000 msecs before retrying.
org.apache.zookeeper.KeeperException$ConnectionLossException:
KeeperErrorCode = ConnectionLoss for
/_hadoopBsp/job_201405140108_0003/_workerProgresses
    at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
    at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
    at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041)
    at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1069)
    at org.apache.giraph.zk.ZooKeeperExt.exists(ZooKeeperExt.java:360)
    at
org.apache.giraph.job.JobProgressTracker$2.run(JobProgressTracker.java:87)
    at java.lang.Thread.run(Thread.java:745)
14/05/14 01:54:08 INFO zookeeper.ClientCnxn: Opening socket connection to
server mercado-12.hpl.hp.com/15.25.119.147:22181. Will not attempt to
authenticate using SASL (unknown error)
14/05/14 01:54:08 WARN zookeeper.ClientCnxn: Session 0x145f9cff031000f for
server null, unexpected error, closing socket connection and attempting
reconnect
java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
    at
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
    at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
14/05/14 01:54:09 INFO mapred.JobClient:  map 93% reduce 0%
14/05/14 01:54:10 INFO zookeeper.ClientCnxn: Opening socket connection to
server mercado-12.hpl.hp.com/15.25.119.147:22181. Will not attempt to
authenticate using SASL (unknown error)
14/05/14 01:54:10 WARN zookeeper.ClientCnxn: Session 0x145f9cff031000f for
server null, unexpected error, closing socket connection and attempting
reconnect
java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
    at
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
    at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
14/05/14 01:54:12 INFO zookeeper.ClientCnxn: Opening socket connection to
server mercado-12.hpl.hp.com/15.25.119.147:22181. Will not attempt to
authenticate using SASL (unknown error)
14/05/14 01:54:12 WARN zookeeper.ClientCnxn: Session 0x145f9cff031000f for
server null, unexpected error, closing socket connection and attempting
reconnect
java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
    at
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
    at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
14/05/14 01:54:12 WARN zk.ZooKeeperExt: exists: Connection loss on attempt
1, waiting 5000 msecs before retrying.
org.apache.zookeeper.KeeperException$ConnectionLossException:
KeeperErrorCode = ConnectionLoss for
/_hadoopBsp/job_201405140108_0003/_workerProgresses
    at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
    at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
    at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041)
    at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1069)
    at org.apache.giraph.zk.ZooKeeperExt.exists(ZooKeeperExt.java:360)
    at
org.apache.giraph.job.JobProgressTracker$2.run(JobProgressTracker.java:87)
    at java.lang.Thread.run(Thread.java:745)
14/05/14 01:54:13 INFO zookeeper.ClientCnxn: Opening socket connection to
server mercado-12.hpl.hp.com/15.25.119.147:22181. Will not attempt to
authenticate using SASL (unknown error)
14/05/14 01:54:13 WARN zookeeper.ClientCnxn: Session 0x145f9cff031000f for
server null, unexpected error, closing socket connection and attempting
reconnect
java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
    at
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
    at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
14/05/14 01:54:15 INFO zookeeper.ClientCnxn: Opening socket connection to
server mercado-12.hpl.hp.com/15.25.119.147:22181. Will not attempt to
authenticate using SASL (unknown error)
14/05/14 01:54:15 WARN zookeeper.ClientCnxn: Session 0x145f9cff031000f for
server null, unexpected error, closing socket connection and attempting
reconnect
java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
    at
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
    at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
14/05/14 01:54:16 INFO zookeeper.ClientCnxn: Opening socket connection to
server mercado-12.hpl.hp.com/15.25.119.147:22181. Will not attempt to
authenticate using SASL (unknown error)
14/05/14 01:54:16 WARN zookeeper.ClientCnxn: Session 0x145f9cff031000f for
server null, unexpected error, closing socket connection and attempting
reconnect
java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
    at
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
    at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
14/05/14 01:54:18 INFO zookeeper.ClientCnxn: Opening socket connection to
server mercado-12.hpl.hp.com/15.25.119.147:22181. Will not attempt to
authenticate using SASL (unknown error)
14/05/14 01:54:18 WARN zookeeper.ClientCnxn: Session 0x145f9cff031000f for
server null, unexpected error, closing socket connection and attempting
reconnect
java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
    at
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
    at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
14/05/14 01:54:18 WARN zk.ZooKeeperExt: exists: Connection loss on attempt
2, waiting 5000 msecs before retrying.
org.apache.zookeeper.KeeperException$ConnectionLossException:
KeeperErrorCode = ConnectionLoss for
/_hadoopBsp/job_201405140108_0003/_workerProgresses
    at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
    at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
    at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041)
    at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1069)
    at org.apache.giraph.zk.ZooKeeperExt.exists(ZooKeeperExt.java:360)
    at
org.apache.giraph.job.JobProgressTracker$2.run(JobProgressTracker.java:87)
    at java.lang.Thread.run(Thread.java:745)
14/05/14 01:54:20 INFO zookeeper.ClientCnxn: Opening socket connection to
server mercado-12.hpl.hp.com/15.25.119.147:22181. Will not attempt to
authenticate using SASL (unknown error)
14/05/14 01:54:20 WARN zookeeper.ClientCnxn: Session 0x145f9cff031000f for
server null, unexpected error, closing socket connection and attempting
reconnect
java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
    at
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
    at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
14/05/14 01:54:21 INFO zookeeper.ClientCnxn: Opening socket connection to
server mercado-12.hpl.hp.com/15.25.119.147:22181. Will not attempt to
authenticate using SASL (unknown error)
14/05/14 01:54:21 WARN zookeeper.ClientCnxn: Session 0x145f9cff031000f for
server null, unexpected error, closing socket connection and attempting
reconnect
java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
    at
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
    at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
14/05/14 01:54:22 INFO zookeeper.ClientCnxn: Opening socket connection to
server mercado-12.hpl.hp.com/15.25.119.147:22181. Will not attempt to
authenticate using SASL (unknown error)
14/05/14 01:54:22 WARN zookeeper.ClientCnxn: Session 0x145f9cff031000f for
server null, unexpected error, closing socket connection and attempting
reconnect
java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
    at
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
    at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
14/05/14 01:54:23 INFO job.JobProgressTracker: run: Exception occurred
java.lang.IllegalStateException: exists: Failed to check
/_hadoopBsp/job_201405140108_0003/_workerProgresses after 3 tries!
    at org.apache.giraph.zk.ZooKeeperExt.exists(ZooKeeperExt.java:369)
    at
org.apache.giraph.job.JobProgressTracker$2.run(JobProgressTracker.java:87)
    at java.lang.Thread.run(Thread.java:745)
14/05/14 01:54:24 INFO zookeeper.ClientCnxn: Opening socket connection to
server mercado-12.hpl.hp.com/15.25.119.147:22181. Will not attempt to
authenticate using SASL (unknown error)
14/05/14 01:54:24 WARN zookeeper.ClientCnxn: Session 0x145f9cff031000f for
server null, unexpected error, closing socket connection and attempting
reconnect
java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
    at
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
    at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
14/05/14 01:54:24 WARN zk.ZooKeeperExt: createExt: Connection loss on
attempt 0, waiting 5000 msecs before retrying.
org.apache.zookeeper.KeeperException$ConnectionLossException:
KeeperErrorCode = ConnectionLoss for
/_hadoopBsp/job_201405140108_0003/_cleanedUpDir/client
    at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
    at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
    at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783)
    at org.apache.giraph.zk.ZooKeeperExt.createExt(ZooKeeperExt.java:152)
    at
org.apache.giraph.job.JobProgressTracker$2.run(JobProgressTracker.java:123)
    at java.lang.Thread.run(Thread.java:745)
14/05/14 01:54:25 INFO zookeeper.ClientCnxn: Opening socket connection to
server mercado-12.hpl.hp.com/15.25.119.147:22181. Will not attempt to
authenticate using SASL (unknown error)
14/05/14 01:54:25 WARN zookeeper.ClientCnxn: Session 0x145f9cff031000f for
server null, unexpected error, closing socket connection and attempting
reconnect
java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
    at
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
    at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
14/05/14 01:54:27 INFO zookeeper.ClientCnxn: Opening socket connection to
server mercado-12.hpl.hp.com/15.25.119.147:22181. Will not attempt to
authenticate using SASL (unknown error)
14/05/14 01:54:27 WARN zookeeper.ClientCnxn: Session 0x145f9cff031000f for
server null, unexpected error, closing socket connection and attempting
reconnect
java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
    at
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
    at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
14/05/14 01:54:29 INFO mapred.JobClient:  map 86% reduce 0%
14/05/14 01:54:30 INFO zookeeper.ClientCnxn: Opening socket connection to
server mercado-12.hpl.hp.com/15.25.119.147:22181. Will not attempt to
authenticate using SASL (unknown error)
14/05/14 01:54:30 WARN zookeeper.ClientCnxn: Session 0x145f9cff031000f for
server null, unexpected error, closing socket connection and attempting
reconnect
java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
    at
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
    at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
14/05/14 01:54:30 WARN zk.ZooKeeperExt: createExt: Connection loss on
attempt 1, waiting 5000 msecs before retrying.
org.apache.zookeeper.KeeperException$ConnectionLossException:
KeeperErrorCode = ConnectionLoss for
/_hadoopBsp/job_201405140108_0003/_cleanedUpDir/client
    at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
    at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
    at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783)
    at org.apache.giraph.zk.ZooKeeperExt.createExt(ZooKeeperExt.java:152)
    at
org.apache.giraph.job.JobProgressTracker$2.run(JobProgressTracker.java:123)
    at java.lang.Thread.run(Thread.java:745)
14/05/14 01:54:30 INFO mapred.JobClient: Job complete: job_201405140108_0003
14/05/14 01:54:30 INFO mapred.JobClient: Counters: 6
14/05/14 01:54:30 INFO mapred.JobClient:   Job Counters
14/05/14 01:54:30 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=30036780
14/05/14 01:54:30 INFO mapred.JobClient:     Total time spent by all
reduces waiting after reserving slots (ms)=0
14/05/14 01:54:30 INFO mapred.JobClient:     Total time spent by all maps
waiting after reserving slots (ms)=0
14/05/14 01:54:30 INFO mapred.JobClient:     Launched map tasks=15
14/05/14 01:54:30 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=0
14/05/14 01:54:30 INFO mapred.JobClient:     Failed map tasks=1

Regards
Arun

Mime
View raw message