giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arun Kumar <toga...@gmail.com>
Subject Re: Error while executing large graph
Date Thu, 15 May 2014 08:09:14 GMT
Hi
Thanks for the replay .

I am running this example in a cluster of 5 machines each machine is having
16 GB of ram.The java  heap size is set as 2000mb and java.child.options is
set with 2000mb and each machine has 4 cores and total number of map
instance is set as 3.
So for each slave machine 10 gb will be used.

My input data is of 1gb size
In this scenario how can out of memory error occur  .Please clarify

Regards
Arun




On Thu, May 15, 2014 at 12:43 AM, Avery Ching <aching@apache.org> wrote:

>  I think this is the key message.
>
>
> 0 out of 196 partitions computed; min free memory on worker 6 - 0.81MB,
> average 11.56MB
>
> Having less than 1 MB free won't work.  Your workers are likely OOM,
> killing the job.  Can you get more memory for your job?
>
>
> On 5/14/14, 3:13 AM, Arun Kumar wrote:
>
>  Hi when i run giraph job against a data of 1 gb i am getting the below
> exception after some times can somebody tell me what is the issue?
> 14/05/14 01:54:01 INFO job.JobProgressTracker: Data from 14 workers -
> Compute superstep 2: 0 out of 4847571 vertices computed; 0 out of 196
> partitions computed; min free memory on worker 6 - 0.81MB, average 11.56MB
> 14/05/14 01:54:03 INFO zookeeper.ClientCnxn: Unable to read additional
> data from server sessionid 0x145f9cff031000f, likely server has closed
> socket, closing socket connection and attempting reconnect
> 14/05/14 01:54:04 INFO zookeeper.ClientCnxn: Opening socket connection to
> server mercado-12.hpl.hp.com/15.25.119.147:22181. Will not attempt to
> authenticate using SASL (unknown error)
> 14/05/14 01:54:04 WARN zookeeper.ClientCnxn: Session 0x145f9cff031000f for
> server null, unexpected error, closing socket connection and attempting
> reconnect
> java.net.ConnectException: Connection refused
>     at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>     at
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
>     at
> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
>     at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
> 14/05/14 01:54:06 INFO zookeeper.ClientCnxn: Opening socket connection to
> server mercado-12.hpl.hp.com/15.25.119.147:22181. Will not attempt to
> authenticate using SASL (unknown error)
> 14/05/14 01:54:06 WARN zookeeper.ClientCnxn: Session 0x145f9cff031000f for
> server null, unexpected error, closing socket connection and attempting
> reconnect
> java.net.ConnectException: Connection refused
>     at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>     at
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
>     at
> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
>     at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
> 14/05/14 01:54:06 WARN zk.ZooKeeperExt: exists: Connection loss on attempt
> 0, waiting 5000 msecs before retrying.
> org.apache.zookeeper.KeeperException$ConnectionLossException:
> KeeperErrorCode = ConnectionLoss for
> /_hadoopBsp/job_201405140108_0003/_workerProgresses
>     at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
>     at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>     at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041)
>     at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1069)
>     at org.apache.giraph.zk.ZooKeeperExt.exists(ZooKeeperExt.java:360)
>     at
> org.apache.giraph.job.JobProgressTracker$2.run(JobProgressTracker.java:87)
>     at java.lang.Thread.run(Thread.java:745)
> 14/05/14 01:54:08 INFO zookeeper.ClientCnxn: Opening socket connection to
> server mercado-12.hpl.hp.com/15.25.119.147:22181. Will not attempt to
> authenticate using SASL (unknown error)
> 14/05/14 01:54:08 WARN zookeeper.ClientCnxn: Session 0x145f9cff031000f for
> server null, unexpected error, closing socket connection and attempting
> reconnect
> java.net.ConnectException: Connection refused
>     at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>     at
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
>     at
> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
>     at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
> 14/05/14 01:54:09 INFO mapred.JobClient:  map 93% reduce 0%
> 14/05/14 01:54:10 INFO zookeeper.ClientCnxn: Opening socket connection to
> server mercado-12.hpl.hp.com/15.25.119.147:22181. Will not attempt to
> authenticate using SASL (unknown error)
> 14/05/14 01:54:10 WARN zookeeper.ClientCnxn: Session 0x145f9cff031000f for
> server null, unexpected error, closing socket connection and attempting
> reconnect
> java.net.ConnectException: Connection refused
>     at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>     at
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
>     at
> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
>     at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
> 14/05/14 01:54:12 INFO zookeeper.ClientCnxn: Opening socket connection to
> server mercado-12.hpl.hp.com/15.25.119.147:22181. Will not attempt to
> authenticate using SASL (unknown error)
> 14/05/14 01:54:12 WARN zookeeper.ClientCnxn: Session 0x145f9cff031000f for
> server null, unexpected error, closing socket connection and attempting
> reconnect
> java.net.ConnectException: Connection refused
>     at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>     at
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
>     at
> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
>     at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
> 14/05/14 01:54:12 WARN zk.ZooKeeperExt: exists: Connection loss on attempt
> 1, waiting 5000 msecs before retrying.
> org.apache.zookeeper.KeeperException$ConnectionLossException:
> KeeperErrorCode = ConnectionLoss for
> /_hadoopBsp/job_201405140108_0003/_workerProgresses
>     at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
>     at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>     at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041)
>     at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1069)
>     at org.apache.giraph.zk.ZooKeeperExt.exists(ZooKeeperExt.java:360)
>     at
> org.apache.giraph.job.JobProgressTracker$2.run(JobProgressTracker.java:87)
>     at java.lang.Thread.run(Thread.java:745)
> 14/05/14 01:54:13 INFO zookeeper.ClientCnxn: Opening socket connection to
> server mercado-12.hpl.hp.com/15.25.119.147:22181. Will not attempt to
> authenticate using SASL (unknown error)
> 14/05/14 01:54:13 WARN zookeeper.ClientCnxn: Session 0x145f9cff031000f for
> server null, unexpected error, closing socket connection and attempting
> reconnect
> java.net.ConnectException: Connection refused
>     at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>     at
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
>     at
> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
>     at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
> 14/05/14 01:54:15 INFO zookeeper.ClientCnxn: Opening socket connection to
> server mercado-12.hpl.hp.com/15.25.119.147:22181. Will not attempt to
> authenticate using SASL (unknown error)
> 14/05/14 01:54:15 WARN zookeeper.ClientCnxn: Session 0x145f9cff031000f for
> server null, unexpected error, closing socket connection and attempting
> reconnect
> java.net.ConnectException: Connection refused
>     at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>     at
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
>     at
> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
>     at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
> 14/05/14 01:54:16 INFO zookeeper.ClientCnxn: Opening socket connection to
> server mercado-12.hpl.hp.com/15.25.119.147:22181. Will not attempt to
> authenticate using SASL (unknown error)
> 14/05/14 01:54:16 WARN zookeeper.ClientCnxn: Session 0x145f9cff031000f for
> server null, unexpected error, closing socket connection and attempting
> reconnect
> java.net.ConnectException: Connection refused
>     at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>     at
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
>     at
> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
>     at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
> 14/05/14 01:54:18 INFO zookeeper.ClientCnxn: Opening socket connection to
> server mercado-12.hpl.hp.com/15.25.119.147:22181. Will not attempt to
> authenticate using SASL (unknown error)
> 14/05/14 01:54:18 WARN zookeeper.ClientCnxn: Session 0x145f9cff031000f for
> server null, unexpected error, closing socket connection and attempting
> reconnect
> java.net.ConnectException: Connection refused
>     at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>     at
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
>     at
> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
>     at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
> 14/05/14 01:54:18 WARN zk.ZooKeeperExt: exists: Connection loss on attempt
> 2, waiting 5000 msecs before retrying.
> org.apache.zookeeper.KeeperException$ConnectionLossException:
> KeeperErrorCode = ConnectionLoss for
> /_hadoopBsp/job_201405140108_0003/_workerProgresses
>     at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
>     at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>     at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041)
>     at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1069)
>     at org.apache.giraph.zk.ZooKeeperExt.exists(ZooKeeperExt.java:360)
>     at
> org.apache.giraph.job.JobProgressTracker$2.run(JobProgressTracker.java:87)
>     at java.lang.Thread.run(Thread.java:745)
> 14/05/14 01:54:20 INFO zookeeper.ClientCnxn: Opening socket connection to
> server mercado-12.hpl.hp.com/15.25.119.147:22181. Will not attempt to
> authenticate using SASL (unknown error)
> 14/05/14 01:54:20 WARN zookeeper.ClientCnxn: Session 0x145f9cff031000f for
> server null, unexpected error, closing socket connection and attempting
> reconnect
> java.net.ConnectException: Connection refused
>     at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>     at
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
>     at
> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
>     at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
> 14/05/14 01:54:21 INFO zookeeper.ClientCnxn: Opening socket connection to
> server mercado-12.hpl.hp.com/15.25.119.147:22181. Will not attempt to
> authenticate using SASL (unknown error)
> 14/05/14 01:54:21 WARN zookeeper.ClientCnxn: Session 0x145f9cff031000f for
> server null, unexpected error, closing socket connection and attempting
> reconnect
> java.net.ConnectException: Connection refused
>     at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>     at
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
>     at
> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
>     at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
> 14/05/14 01:54:22 INFO zookeeper.ClientCnxn: Opening socket connection to
> server mercado-12.hpl.hp.com/15.25.119.147:22181. Will not attempt to
> authenticate using SASL (unknown error)
> 14/05/14 01:54:22 WARN zookeeper.ClientCnxn: Session 0x145f9cff031000f for
> server null, unexpected error, closing socket connection and attempting
> reconnect
> java.net.ConnectException: Connection refused
>     at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>     at
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
>     at
> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
>     at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
> 14/05/14 01:54:23 INFO job.JobProgressTracker: run: Exception occurred
> java.lang.IllegalStateException: exists: Failed to check
> /_hadoopBsp/job_201405140108_0003/_workerProgresses after 3 tries!
>     at org.apache.giraph.zk.ZooKeeperExt.exists(ZooKeeperExt.java:369)
>     at
> org.apache.giraph.job.JobProgressTracker$2.run(JobProgressTracker.java:87)
>     at java.lang.Thread.run(Thread.java:745)
> 14/05/14 01:54:24 INFO zookeeper.ClientCnxn: Opening socket connection to
> server mercado-12.hpl.hp.com/15.25.119.147:22181. Will not attempt to
> authenticate using SASL (unknown error)
> 14/05/14 01:54:24 WARN zookeeper.ClientCnxn: Session 0x145f9cff031000f for
> server null, unexpected error, closing socket connection and attempting
> reconnect
> java.net.ConnectException: Connection refused
>     at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>     at
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
>     at
> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
>     at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
> 14/05/14 01:54:24 WARN zk.ZooKeeperExt: createExt: Connection loss on
> attempt 0, waiting 5000 msecs before retrying.
> org.apache.zookeeper.KeeperException$ConnectionLossException:
> KeeperErrorCode = ConnectionLoss for
> /_hadoopBsp/job_201405140108_0003/_cleanedUpDir/client
>     at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
>     at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>     at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783)
>     at org.apache.giraph.zk.ZooKeeperExt.createExt(ZooKeeperExt.java:152)
>     at
> org.apache.giraph.job.JobProgressTracker$2.run(JobProgressTracker.java:123)
>     at java.lang.Thread.run(Thread.java:745)
> 14/05/14 01:54:25 INFO zookeeper.ClientCnxn: Opening socket connection to
> server mercado-12.hpl.hp.com/15.25.119.147:22181. Will not attempt to
> authenticate using SASL (unknown error)
> 14/05/14 01:54:25 WARN zookeeper.ClientCnxn: Session 0x145f9cff031000f for
> server null, unexpected error, closing socket connection and attempting
> reconnect
> java.net.ConnectException: Connection refused
>     at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>     at
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
>     at
> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
>     at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
> 14/05/14 01:54:27 INFO zookeeper.ClientCnxn: Opening socket connection to
> server mercado-12.hpl.hp.com/15.25.119.147:22181. Will not attempt to
> authenticate using SASL (unknown error)
> 14/05/14 01:54:27 WARN zookeeper.ClientCnxn: Session 0x145f9cff031000f for
> server null, unexpected error, closing socket connection and attempting
> reconnect
> java.net.ConnectException: Connection refused
>     at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>     at
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
>     at
> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
>     at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
> 14/05/14 01:54:29 INFO mapred.JobClient:  map 86% reduce 0%
> 14/05/14 01:54:30 INFO zookeeper.ClientCnxn: Opening socket connection to
> server mercado-12.hpl.hp.com/15.25.119.147:22181. Will not attempt to
> authenticate using SASL (unknown error)
> 14/05/14 01:54:30 WARN zookeeper.ClientCnxn: Session 0x145f9cff031000f for
> server null, unexpected error, closing socket connection and attempting
> reconnect
> java.net.ConnectException: Connection refused
>     at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>     at
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
>     at
> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
>     at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
> 14/05/14 01:54:30 WARN zk.ZooKeeperExt: createExt: Connection loss on
> attempt 1, waiting 5000 msecs before retrying.
> org.apache.zookeeper.KeeperException$ConnectionLossException:
> KeeperErrorCode = ConnectionLoss for
> /_hadoopBsp/job_201405140108_0003/_cleanedUpDir/client
>     at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
>     at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>     at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783)
>     at org.apache.giraph.zk.ZooKeeperExt.createExt(ZooKeeperExt.java:152)
>     at
> org.apache.giraph.job.JobProgressTracker$2.run(JobProgressTracker.java:123)
>     at java.lang.Thread.run(Thread.java:745)
> 14/05/14 01:54:30 INFO mapred.JobClient: Job complete:
> job_201405140108_0003
> 14/05/14 01:54:30 INFO mapred.JobClient: Counters: 6
> 14/05/14 01:54:30 INFO mapred.JobClient:   Job Counters
> 14/05/14 01:54:30 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=30036780
> 14/05/14 01:54:30 INFO mapred.JobClient:     Total time spent by all
> reduces waiting after reserving slots (ms)=0
> 14/05/14 01:54:30 INFO mapred.JobClient:     Total time spent by all maps
> waiting after reserving slots (ms)=0
> 14/05/14 01:54:30 INFO mapred.JobClient:     Launched map tasks=15
> 14/05/14 01:54:30 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=0
> 14/05/14 01:54:30 INFO mapred.JobClient:     Failed map tasks=1
>
>  Regards
>  Arun
>
>
>

Mime
View raw message