giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arghya Kusum Das <arghyakusumdas2...@gmail.com>
Subject Giraph job fails on large data and large number of nodes
Date Sat, 01 Nov 2014 18:27:46 GMT
Hi,

My Giraph program is running for small data on smaller number of nodes (eg.
10GB data on 32 nodes) correctly.
I was trying to run it on 128 nodes with 32GB RAM, 16-cores and 240GB hdd
per node. The graph size is 91GB and it failed with the following exception
in the log. Can anyone help me to resolve it?

2014-11-01 12:54:43,364 INFO org.apache.giraph.comm.netty.NettyServer:
start: Using Netty without authentication.
2014-11-01 12:54:43,386 INFO org.apache.giraph.comm.netty.NettyServer:
start: Using Netty without authentication.
2014-11-01 12:54:43,414 INFO org.apache.giraph.comm.netty.NettyServer:
start: Using Netty without authentication.
2014-11-01 12:54:43,417 INFO org.apache.giraph.comm.netty.NettyServer:
start: Using Netty without authentication.
2014-11-01 12:54:44,363 INFO org.apache.zookeeper.ClientCnxn: Opening
socket connection to server mike070/204.90.45.70:22181
2014-11-01 12:54:44,364 WARN org.apache.zookeeper.ClientCnxn: Session
0x1496c7e01e30002 for server null, unexpected error, closing socket
connection and attempting reconnect
java.net.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:592)
        at
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1119)
2014-11-01 12:54:44,464 WARN org.apache.giraph.zk.ZooKeeperExt: createExt:
Connection loss on attempt 0, waiting 5000 msecs before retrying.
org.apache.zookeeper.KeeperException$ConnectionLossException:
KeeperErrorCode = ConnectionLoss for
/_hadoopBsp/job_201411011248_0003/_masterJobState
        at
org.apache.zookeeper.KeeperException.create(KeeperException.java:90)
        at
org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
        at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:637)
        at
org.apache.giraph.zk.ZooKeeperExt.createExt(ZooKeeperExt.java:152)
        at org.apache.giraph.bsp.BspService.getJobState(BspService.java:670)
        at
org.apache.giraph.master.BspServiceMaster.becomeMaster(BspServiceMaster.java:843)
        at org.apache.giraph.master.MasterThread.run(MasterThread.java:98)
2014-11-01 12:54:44,638 INFO org.apache.zookeeper.ClientCnxn: Opening
socket connection to server mike070/204.90.45.70:22181
2014-11-01 12:54:44,639 WARN org.apache.zookeeper.ClientCnxn: Session
0x1496c7e01e30010 for server null, unexpected error, closing socket
connection and attempting reconnect
java.net.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:592)
        at
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1119)
2014-11-01 12:54:46,159 INFO org.apache.zookeeper.ClientCnxn: Opening
socket connection to server mike070/204.90.45.70:22181
2014-11-01 12:54:46,159 WARN org.apache.zookeeper.ClientCnxn: Session
0x1496c7e01e30002 for server null, unexpected error, closing socket
connection and attempting reconnect
java.net.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:592)
        at
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1119)
2014-11-01 12:54:46,481 INFO org.apache.zookeeper.ClientCnxn: Opening
socket connection to server mike070/204.90.45.70:22181
2014-11-01 12:54:46,481 WARN org.apache.zookeeper.ClientCnxn: Session
0x1496c7e01e30010 for server null, unexpected error, closing socket
connection and attempting reconnect
java.net.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:592)
        at
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1119)
2014-11-01 12:54:47,611 INFO org.apache.zookeeper.ClientCnxn: Opening
socket connection to server mike070/204.90.45.70:22181
2014-11-01 12:54:47,611 WARN org.apache.zookeeper.ClientCnxn: Session
0x1496c7e01e30010 for server null, unexpected error, closing socket
connection and attempting reconnect
java.net.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:592)
        at
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1119)
2014-11-01 12:54:48,234 INFO org.apache.zookeeper.ClientCnxn: Opening
socket connection to server mike070/204.90.45.70:22181
2014-11-01 12:54:48,234 WARN org.apache.zookeeper.ClientCnxn: Session
0x1496c7e01e30002 for server null, unexpected error, closing socket
connection and attempting reconnect
java.net.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:592)
        at
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1119)
2014-11-01 12:54:49,469 INFO org.apache.zookeeper.ClientCnxn: Opening
socket connection to server mike070/204.90.45.70:22181
2014-11-01 12:54:49,469 WARN org.apache.zookeeper.ClientCnxn: Session
0x1496c7e01e30010 for server null, unexpected error, closing socket
connection and attempting reconnect
java.net.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:592)
-- 
Thanks and regards,
Arghya Kusum Das
(225-362-4031)

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message