giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Piotr Kozikowski <pi...@liveramp.com>
Subject Netty problems with large dataset
Date Sat, 18 Jul 2015 02:05:06 GMT
Hi all,

We've been able to run Giraph on a production cluster on small and moderate
(1.7B edges) datasets. However, when trying a large dataset (10B+ edges),
the workers start logging tons of Netty warnings, and eventually the job as
a whole dies, usually with the master reporting missing workers and killing
the job. All of this happens during superstep -1. Are there any obvious
things to try here?

Thank you,
Piotr

2015-07-17 16:24:46,503 INFO [main]
org.apache.giraph.comm.netty.NettyClient: checkRequestsForProblems:
Re-issuing request
(reqId=29,destAddr=ds0701.liveramp.net:30105,elapsedNanos=61867763,started=Tue
Feb 10 18:47:11 PST 1970)
2015-07-17 16:24:46,503 INFO [main]
org.apache.giraph.comm.netty.NettyClient: checkRequestsForProblems:
Re-issuing request
(reqId=55,destAddr=ds0640.liveramp.net:30020,elapsedNanos=61877851,started=Tue
Feb 10 18:47:11 PST 1970)
2015-07-17 16:24:46,503 INFO [main]
org.apache.giraph.comm.netty.NettyClient: checkRequestsForProblems:
Re-issuing request
(reqId=61,destAddr=ds0689.liveramp.net:30103,elapsedNanos=61887313,started=Tue
Feb 10 18:47:11 PST 1970)
2015-07-17 16:24:46,503 INFO [main]
org.apache.giraph.comm.netty.NettyClient: checkRequestsForProblems:
Re-issuing request
(reqId=51,destAddr=ds0665.liveramp.net:30044,elapsedNanos=61895473,started=Tue
Feb 10 18:47:11 PST 1970)

2015-07-17 16:24:54,965 INFO [netty-client-worker-0]
org.apache.giraph.comm.netty.handler.ResponseClientHandler:
messageReceived: Already received response for (taskId = 93, requestId = 18)
2015-07-17 16:25:03,778 WARN [main]
org.apache.giraph.comm.netty.NettyClient: checkRequestsForProblems: Problem
with request id (destTask=90,reqId=30) connected = true, future done =
false, success = false, cause = null, elapsed time = 609563, destination =
ds0619.liveramp.net/10.100.132.111:30090 (reqId=30,destAddr=
ds0619.liveramp.net:30090,elapsedNanos=609563278574,started=Tue Feb 10
18:37:18 PST 1970,writeDone=false,writeSuccess=false)
2015-07-17 16:25:03,778 WARN [main]
org.apache.giraph.comm.netty.NettyClient: checkRequestsForProblems: Problem
with request id (destTask=58,reqId=30) connected = true, future done =
false, success = false, cause = null, elapsed time = 609563, destination =
ds0649.liveramp.net/10.100.132.141:30058 (reqId=30,destAddr=
ds0649.liveramp.net:30058,elapsedNanos=609563350498,started=Tue Feb 10
18:37:18 PST 1970,writeDone=false,writeSuccess=false)
2015-07-17 16:25:03,778 WARN [main]
org.apache.giraph.comm.netty.NettyClient: checkRequestsForProblems: Problem
with request id (destTask=42,reqId=31) connected = true, future done =
false, success = false, cause = null, elapsed time = 609563, destination =
ds0671.liveramp.net/10.100.132.163:30042 (reqId=31,destAddr=
ds0671.liveramp.net:30042,elapsedNanos=609563356568,started=Tue Feb 10
18:37:18 PST 1970,writeDone=false,writeSuccess=false)

Mime
View raw message