giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Avery Ching <ach...@apache.org>
Subject Re: Error while executing large graph
Date Wed, 14 May 2014 19:13:51 GMT
I think this is the key message.

0 out of 196 partitions computed; min free memory on worker 6 - 0.81MB, 
average 11.56MB

Having less than 1 MB free won't work.  Your workers are likely OOM, 
killing the job.  Can you get more memory for your job?

On 5/14/14, 3:13 AM, Arun Kumar wrote:
> Hi when i run giraph job against a data of 1 gb i am getting the below 
> exception after some times can somebody tell me what is the issue?
> 14/05/14 01:54:01 INFO job.JobProgressTracker: Data from 14 workers - 
> Compute superstep 2: 0 out of 4847571 vertices computed; 0 out of 196 
> partitions computed; min free memory on worker 6 - 0.81MB, average 11.56MB
> 14/05/14 01:54:03 INFO zookeeper.ClientCnxn: Unable to read additional 
> data from server sessionid 0x145f9cff031000f, likely server has closed 
> socket, closing socket connection and attempting reconnect
> 14/05/14 01:54:04 INFO zookeeper.ClientCnxn: Opening socket connection 
> to server mercado-12.hpl.hp.com/15.25.119.147:22181 
> <http://mercado-12.hpl.hp.com/15.25.119.147:22181>. Will not attempt 
> to authenticate using SASL (unknown error)
> 14/05/14 01:54:04 WARN zookeeper.ClientCnxn: Session 0x145f9cff031000f 
> for server null, unexpected error, closing socket connection and 
> attempting reconnect
> java.net.ConnectException: Connection refused
>     at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>     at 
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
>     at 
> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
>     at 
> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
> 14/05/14 01:54:06 INFO zookeeper.ClientCnxn: Opening socket connection 
> to server mercado-12.hpl.hp.com/15.25.119.147:22181 
> <http://mercado-12.hpl.hp.com/15.25.119.147:22181>. Will not attempt 
> to authenticate using SASL (unknown error)
> 14/05/14 01:54:06 WARN zookeeper.ClientCnxn: Session 0x145f9cff031000f 
> for server null, unexpected error, closing socket connection and 
> attempting reconnect
> java.net.ConnectException: Connection refused
>     at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>     at 
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
>     at 
> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
>     at 
> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
> 14/05/14 01:54:06 WARN zk.ZooKeeperExt: exists: Connection loss on 
> attempt 0, waiting 5000 msecs before retrying.
> org.apache.zookeeper.KeeperException$ConnectionLossException: 
> KeeperErrorCode = ConnectionLoss for 
> /_hadoopBsp/job_201405140108_0003/_workerProgresses
>     at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
>     at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>     at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041)
>     at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1069)
>     at org.apache.giraph.zk.ZooKeeperExt.exists(ZooKeeperExt.java:360)
>     at 
> org.apache.giraph.job.JobProgressTracker$2.run(JobProgressTracker.java:87)
>     at java.lang.Thread.run(Thread.java:745)
> 14/05/14 01:54:08 INFO zookeeper.ClientCnxn: Opening socket connection 
> to server mercado-12.hpl.hp.com/15.25.119.147:22181 
> <http://mercado-12.hpl.hp.com/15.25.119.147:22181>. Will not attempt 
> to authenticate using SASL (unknown error)
> 14/05/14 01:54:08 WARN zookeeper.ClientCnxn: Session 0x145f9cff031000f 
> for server null, unexpected error, closing socket connection and 
> attempting reconnect
> java.net.ConnectException: Connection refused
>     at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>     at 
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
>     at 
> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
>     at 
> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
> 14/05/14 01:54:09 INFO mapred.JobClient:  map 93% reduce 0%
> 14/05/14 01:54:10 INFO zookeeper.ClientCnxn: Opening socket connection 
> to server mercado-12.hpl.hp.com/15.25.119.147:22181 
> <http://mercado-12.hpl.hp.com/15.25.119.147:22181>. Will not attempt 
> to authenticate using SASL (unknown error)
> 14/05/14 01:54:10 WARN zookeeper.ClientCnxn: Session 0x145f9cff031000f 
> for server null, unexpected error, closing socket connection and 
> attempting reconnect
> java.net.ConnectException: Connection refused
>     at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>     at 
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
>     at 
> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
>     at 
> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
> 14/05/14 01:54:12 INFO zookeeper.ClientCnxn: Opening socket connection 
> to server mercado-12.hpl.hp.com/15.25.119.147:22181 
> <http://mercado-12.hpl.hp.com/15.25.119.147:22181>. Will not attempt 
> to authenticate using SASL (unknown error)
> 14/05/14 01:54:12 WARN zookeeper.ClientCnxn: Session 0x145f9cff031000f 
> for server null, unexpected error, closing socket connection and 
> attempting reconnect
> java.net.ConnectException: Connection refused
>     at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>     at 
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
>     at 
> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
>     at 
> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
> 14/05/14 01:54:12 WARN zk.ZooKeeperExt: exists: Connection loss on 
> attempt 1, waiting 5000 msecs before retrying.
> org.apache.zookeeper.KeeperException$ConnectionLossException: 
> KeeperErrorCode = ConnectionLoss for 
> /_hadoopBsp/job_201405140108_0003/_workerProgresses
>     at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
>     at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>     at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041)
>     at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1069)
>     at org.apache.giraph.zk.ZooKeeperExt.exists(ZooKeeperExt.java:360)
>     at 
> org.apache.giraph.job.JobProgressTracker$2.run(JobProgressTracker.java:87)
>     at java.lang.Thread.run(Thread.java:745)
> 14/05/14 01:54:13 INFO zookeeper.ClientCnxn: Opening socket connection 
> to server mercado-12.hpl.hp.com/15.25.119.147:22181 
> <http://mercado-12.hpl.hp.com/15.25.119.147:22181>. Will not attempt 
> to authenticate using SASL (unknown error)
> 14/05/14 01:54:13 WARN zookeeper.ClientCnxn: Session 0x145f9cff031000f 
> for server null, unexpected error, closing socket connection and 
> attempting reconnect
> java.net.ConnectException: Connection refused
>     at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>     at 
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
>     at 
> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
>     at 
> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
> 14/05/14 01:54:15 INFO zookeeper.ClientCnxn: Opening socket connection 
> to server mercado-12.hpl.hp.com/15.25.119.147:22181 
> <http://mercado-12.hpl.hp.com/15.25.119.147:22181>. Will not attempt 
> to authenticate using SASL (unknown error)
> 14/05/14 01:54:15 WARN zookeeper.ClientCnxn: Session 0x145f9cff031000f 
> for server null, unexpected error, closing socket connection and 
> attempting reconnect
> java.net.ConnectException: Connection refused
>     at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>     at 
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
>     at 
> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
>     at 
> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
> 14/05/14 01:54:16 INFO zookeeper.ClientCnxn: Opening socket connection 
> to server mercado-12.hpl.hp.com/15.25.119.147:22181 
> <http://mercado-12.hpl.hp.com/15.25.119.147:22181>. Will not attempt 
> to authenticate using SASL (unknown error)
> 14/05/14 01:54:16 WARN zookeeper.ClientCnxn: Session 0x145f9cff031000f 
> for server null, unexpected error, closing socket connection and 
> attempting reconnect
> java.net.ConnectException: Connection refused
>     at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>     at 
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
>     at 
> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
>     at 
> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
> 14/05/14 01:54:18 INFO zookeeper.ClientCnxn: Opening socket connection 
> to server mercado-12.hpl.hp.com/15.25.119.147:22181 
> <http://mercado-12.hpl.hp.com/15.25.119.147:22181>. Will not attempt 
> to authenticate using SASL (unknown error)
> 14/05/14 01:54:18 WARN zookeeper.ClientCnxn: Session 0x145f9cff031000f 
> for server null, unexpected error, closing socket connection and 
> attempting reconnect
> java.net.ConnectException: Connection refused
>     at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>     at 
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
>     at 
> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
>     at 
> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
> 14/05/14 01:54:18 WARN zk.ZooKeeperExt: exists: Connection loss on 
> attempt 2, waiting 5000 msecs before retrying.
> org.apache.zookeeper.KeeperException$ConnectionLossException: 
> KeeperErrorCode = ConnectionLoss for 
> /_hadoopBsp/job_201405140108_0003/_workerProgresses
>     at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
>     at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>     at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041)
>     at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1069)
>     at org.apache.giraph.zk.ZooKeeperExt.exists(ZooKeeperExt.java:360)
>     at 
> org.apache.giraph.job.JobProgressTracker$2.run(JobProgressTracker.java:87)
>     at java.lang.Thread.run(Thread.java:745)
> 14/05/14 01:54:20 INFO zookeeper.ClientCnxn: Opening socket connection 
> to server mercado-12.hpl.hp.com/15.25.119.147:22181 
> <http://mercado-12.hpl.hp.com/15.25.119.147:22181>. Will not attempt 
> to authenticate using SASL (unknown error)
> 14/05/14 01:54:20 WARN zookeeper.ClientCnxn: Session 0x145f9cff031000f 
> for server null, unexpected error, closing socket connection and 
> attempting reconnect
> java.net.ConnectException: Connection refused
>     at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>     at 
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
>     at 
> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
>     at 
> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
> 14/05/14 01:54:21 INFO zookeeper.ClientCnxn: Opening socket connection 
> to server mercado-12.hpl.hp.com/15.25.119.147:22181 
> <http://mercado-12.hpl.hp.com/15.25.119.147:22181>. Will not attempt 
> to authenticate using SASL (unknown error)
> 14/05/14 01:54:21 WARN zookeeper.ClientCnxn: Session 0x145f9cff031000f 
> for server null, unexpected error, closing socket connection and 
> attempting reconnect
> java.net.ConnectException: Connection refused
>     at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>     at 
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
>     at 
> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
>     at 
> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
> 14/05/14 01:54:22 INFO zookeeper.ClientCnxn: Opening socket connection 
> to server mercado-12.hpl.hp.com/15.25.119.147:22181 
> <http://mercado-12.hpl.hp.com/15.25.119.147:22181>. Will not attempt 
> to authenticate using SASL (unknown error)
> 14/05/14 01:54:22 WARN zookeeper.ClientCnxn: Session 0x145f9cff031000f 
> for server null, unexpected error, closing socket connection and 
> attempting reconnect
> java.net.ConnectException: Connection refused
>     at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>     at 
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
>     at 
> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
>     at 
> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
> 14/05/14 01:54:23 INFO job.JobProgressTracker: run: Exception occurred
> java.lang.IllegalStateException: exists: Failed to check 
> /_hadoopBsp/job_201405140108_0003/_workerProgresses after 3 tries!
>     at org.apache.giraph.zk.ZooKeeperExt.exists(ZooKeeperExt.java:369)
>     at 
> org.apache.giraph.job.JobProgressTracker$2.run(JobProgressTracker.java:87)
>     at java.lang.Thread.run(Thread.java:745)
> 14/05/14 01:54:24 INFO zookeeper.ClientCnxn: Opening socket connection 
> to server mercado-12.hpl.hp.com/15.25.119.147:22181 
> <http://mercado-12.hpl.hp.com/15.25.119.147:22181>. Will not attempt 
> to authenticate using SASL (unknown error)
> 14/05/14 01:54:24 WARN zookeeper.ClientCnxn: Session 0x145f9cff031000f 
> for server null, unexpected error, closing socket connection and 
> attempting reconnect
> java.net.ConnectException: Connection refused
>     at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>     at 
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
>     at 
> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
>     at 
> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
> 14/05/14 01:54:24 WARN zk.ZooKeeperExt: createExt: Connection loss on 
> attempt 0, waiting 5000 msecs before retrying.
> org.apache.zookeeper.KeeperException$ConnectionLossException: 
> KeeperErrorCode = ConnectionLoss for 
> /_hadoopBsp/job_201405140108_0003/_cleanedUpDir/client
>     at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
>     at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>     at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783)
>     at org.apache.giraph.zk.ZooKeeperExt.createExt(ZooKeeperExt.java:152)
>     at 
> org.apache.giraph.job.JobProgressTracker$2.run(JobProgressTracker.java:123)
>     at java.lang.Thread.run(Thread.java:745)
> 14/05/14 01:54:25 INFO zookeeper.ClientCnxn: Opening socket connection 
> to server mercado-12.hpl.hp.com/15.25.119.147:22181 
> <http://mercado-12.hpl.hp.com/15.25.119.147:22181>. Will not attempt 
> to authenticate using SASL (unknown error)
> 14/05/14 01:54:25 WARN zookeeper.ClientCnxn: Session 0x145f9cff031000f 
> for server null, unexpected error, closing socket connection and 
> attempting reconnect
> java.net.ConnectException: Connection refused
>     at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>     at 
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
>     at 
> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
>     at 
> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
> 14/05/14 01:54:27 INFO zookeeper.ClientCnxn: Opening socket connection 
> to server mercado-12.hpl.hp.com/15.25.119.147:22181 
> <http://mercado-12.hpl.hp.com/15.25.119.147:22181>. Will not attempt 
> to authenticate using SASL (unknown error)
> 14/05/14 01:54:27 WARN zookeeper.ClientCnxn: Session 0x145f9cff031000f 
> for server null, unexpected error, closing socket connection and 
> attempting reconnect
> java.net.ConnectException: Connection refused
>     at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>     at 
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
>     at 
> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
>     at 
> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
> 14/05/14 01:54:29 INFO mapred.JobClient:  map 86% reduce 0%
> 14/05/14 01:54:30 INFO zookeeper.ClientCnxn: Opening socket connection 
> to server mercado-12.hpl.hp.com/15.25.119.147:22181 
> <http://mercado-12.hpl.hp.com/15.25.119.147:22181>. Will not attempt 
> to authenticate using SASL (unknown error)
> 14/05/14 01:54:30 WARN zookeeper.ClientCnxn: Session 0x145f9cff031000f 
> for server null, unexpected error, closing socket connection and 
> attempting reconnect
> java.net.ConnectException: Connection refused
>     at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>     at 
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
>     at 
> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
>     at 
> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
> 14/05/14 01:54:30 WARN zk.ZooKeeperExt: createExt: Connection loss on 
> attempt 1, waiting 5000 msecs before retrying.
> org.apache.zookeeper.KeeperException$ConnectionLossException: 
> KeeperErrorCode = ConnectionLoss for 
> /_hadoopBsp/job_201405140108_0003/_cleanedUpDir/client
>     at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
>     at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>     at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783)
>     at org.apache.giraph.zk.ZooKeeperExt.createExt(ZooKeeperExt.java:152)
>     at 
> org.apache.giraph.job.JobProgressTracker$2.run(JobProgressTracker.java:123)
>     at java.lang.Thread.run(Thread.java:745)
> 14/05/14 01:54:30 INFO mapred.JobClient: Job complete: 
> job_201405140108_0003
> 14/05/14 01:54:30 INFO mapred.JobClient: Counters: 6
> 14/05/14 01:54:30 INFO mapred.JobClient:   Job Counters
> 14/05/14 01:54:30 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=30036780
> 14/05/14 01:54:30 INFO mapred.JobClient:     Total time spent by all 
> reduces waiting after reserving slots (ms)=0
> 14/05/14 01:54:30 INFO mapred.JobClient:     Total time spent by all 
> maps waiting after reserving slots (ms)=0
> 14/05/14 01:54:30 INFO mapred.JobClient:     Launched map tasks=15
> 14/05/14 01:54:30 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0
> 14/05/14 01:54:30 INFO mapred.JobClient:     Failed map tasks=1
>
> Regards
> Arun
>


Mime
View raw message