Mailing-List: contact user-help@giraph.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@giraph.apache.org
MIME-Version: 1.0
From: =?UTF-8?Q?Jos=C3=A9_Luis_Larroque?= <larroquester@gmail.com>
Date: Fri, 26 Aug 2016 21:24:28 -0300
Message-ID: <CALWHDexBrYfLwiW9V9CiCqXSwq4u1wbDOXbpK_Pbpobn2HmeMg@mail.gmail.com>
Subject: Giraph application get stuck, on superstep 4, all workers active but
 without progress
To: user@giraph.apache.org
Content-Type: multipart/alternative; boundary=001a11407d0aa566d6053b02a846
archived-at: Sat, 27 Aug 2016 00:24:55 -0000

--001a11407d0aa566d6053b02a846
Content-Type: text/plain; charset=UTF-8

Hi again guys!

I'm doing BFS search through the Wikipedia (spanish edition) site. I
converted the dump <https://dumps.wikimedia.org/eswiki/20160601/> (
https://dumps.wikimedia.org/eswiki/20160601) into a file that could be read
with Giraph.

The BFS is searching for paths, and its all ok until get stuck in some
point of the superstep four.

I'm using a cluster of 5 nodes (4 slaves core, 1 Master) on AWS. Each node
is a r3.8xlarge ec2 instance. The command for executing the BFS is this one:
/home/hadoop/bin/yarn jar /home/hadoop/giraph/giraph.jar
ar.edu.info.unlp.tesina.lectura.grafo.BusquedaDeCaminosNavegacionalesWikiquote
-vif
ar.edu.info.unlp.tesina.vertice.estructuras.IdTextWithComplexValueInputFormat
-vip /user/hduser/input/grafo-wikipedia.txt -vof
ar.edu.info.unlp.tesina.vertice.estructuras.IdTextWithComplexValueOutputFormat
-op /user/hduser/output/caminosNavegacionales -w 4 -yh 120000 -ca
giraph.useOutOfCoreMessages=true,giraph.metrics.enable=true,giraph.maxMessagesInMemory=1000000000,giraph.isStaticGraph=true,
*giraph.logLevel=Debug*

Each container have 120GB (almost). I'm using 1000M messages limit in
outOfCore, because i believed that was the problem, but  apparently is not.

This ones are the master logs (it seems that is waiting for workers for
finish but they just don't...and keeps like this forever...):

6/08/26 00:43:08 INFO yarn.GiraphYarnTask: [STATUS: task-3]
MASTER_ZOOKEEPER_ONLY - 0 finished out of 4 on superstep 4
16/08/26 00:43:08 DEBUG master.BspServiceMaster: barrierOnWorkerList: Got
finished worker list = [], size = 0, worker list =
[Worker(hostname=ip-172-31-29-14.ec2.internal, MRtaskID=0, port=30000),
Worker(hostname=ip-172-31-29-16.ec2.internal, MRtaskID=1, port=30001),
Worker(hostname=ip-172-31-29-15.ec2.internal, MRtaskID=2, port=30002),
Worker(hostname=ip-172-31-29-14.ec2.internal, MRtaskID=4, port=30004)],
size = 4 from
/_hadoopBsp/giraph_yarn_application_1472168758138_0002/_applicationAttemptsDir/0/_superstepDir/4/_workerFinishedDir
16/08/26 00:43:08 INFO yarn.GiraphYarnTask: [STATUS: task-3]
MASTER_ZOOKEEPER_ONLY - 0 finished out of 4 on superstep 4

*16/08/26 00:43:08 DEBUG zk.PredicateLock: waitMsecs: Wait for
1000016/08/26 00:43:18 DEBUG zk.PredicateLock: waitMsecs: Got timed
signaled of false*
...thirty times same last two lines...
...
6/08/26 00:43:08 INFO yarn.GiraphYarnTask: [STATUS: task-3]
MASTER_ZOOKEEPER_ONLY - 0 finished out of 4 on superstep 4
16/08/26 00:43:08 DEBUG master.BspServiceMaster: barrierOnWorkerList: Got
finished worker list = [], size = 0, worker list =
[Worker(hostname=ip-172-31-29-14.ec2.internal, MRtaskID=0, port=30000),
Worker(hostname=ip-172-31-29-16.ec2.internal, MRtaskID=1, port=30001),
Worker(hostname=ip-172-31-29-15.ec2.internal, MRtaskID=2, port=30002),
Worker(hostname=ip-172-31-29-14.ec2.internal, MRtaskID=4, port=30004)],
size = 4 from
/_hadoopBsp/giraph_yarn_application_1472168758138_0002/_applicationAttemptsDir/0/_superstepDir/4/_workerFinishedDir
16/08/26 00:43:08 INFO yarn.GiraphYarnTask: [STATUS: task-3]
MASTER_ZOOKEEPER_ONLY - 0 finished out of 4 on superstep 4

And in *all* workers, there is no information on what is happening (i'm
testing this with *giraph.logLevel=Debug* because with the default level of
giraph log i was lost), and the workers say this over and over again:

16/08/26 01:05:08 INFO utils.ProgressableUtils: waitFor: Future result not
ready yet java.util.concurrent.FutureTask@7392f34d
16/08/26 01:05:08 INFO utils.ProgressableUtils: waitFor: Waiting for
org.apache.giraph.utils.ProgressableUtils$FutureWaitable@34a37f82

Before starting the superstep 4, the information on each worker was the
following one
16/08/26 00:43:08 INFO yarn.GiraphYarnTask: [STATUS: task-2]
startSuperstep: WORKER_ONLY - Attempt=0, Superstep=4
16/08/26 00:43:08 DEBUG worker.BspServiceWorker: startSuperstep:
addressesAndPartitions[Worker(hostname=ip-172-31-29-14.ec2.internal,
MRtaskID=0, port=30000), Worker(hostname=ip-172-31-29-16.ec2.internal,
MRtaskID
=1, port=30001), Worker(hostname=ip-172-31-29-15.ec2.internal, MRtaskID=2,
port=30002), Worker(hostname=ip-172-31-29-14.ec2.internal, MRtaskID=4,
port=30004)]
16/08/26 00:43:08 DEBUG worker.BspServiceWorker: 0
Worker(hostname=ip-172-31-29-14.ec2.internal, MRtaskID=0, port=30000)
16/08/26 00:43:08 DEBUG worker.BspServiceWorker: 1
Worker(hostname=ip-172-31-29-16.ec2.internal, MRtaskID=1, port=30001)
16/08/26 00:43:08 DEBUG worker.BspServiceWorker: 2
Worker(hostname=ip-172-31-29-15.ec2.internal, MRtaskID=2, port=30002)
16/08/26 00:43:08 DEBUG worker.BspServiceWorker: 3
Worker(hostname=ip-172-31-29-14.ec2.internal, MRtaskID=4, port=30004)
16/08/26 00:43:08 DEBUG worker.BspServiceWorker: 4
Worker(hostname=ip-172-31-29-14.ec2.internal, MRtaskID=0, port=30000)
16/08/26 00:43:08 DEBUG worker.BspServiceWorker: 5
Worker(hostname=ip-172-31-29-16.ec2.internal, MRtaskID=1, port=30001)
16/08/26 00:43:08 DEBUG worker.BspServiceWorker: 6
Worker(hostname=ip-172-31-29-15.ec2.internal, MRtaskID=2, port=30002)
16/08/26 00:43:08 DEBUG worker.BspServiceWorker: 7
Worker(hostname=ip-172-31-29-14.ec2.internal, MRtaskID=4, port=30004)
16/08/26 00:43:08 DEBUG worker.BspServiceWorker: 8
Worker(hostname=ip-172-31-29-14.ec2.internal, MRtaskID=0, port=30000)
16/08/26 00:43:08 DEBUG worker.BspServiceWorker: 9
Worker(hostname=ip-172-31-29-16.ec2.internal, MRtaskID=1, port=30001)
16/08/26 00:43:08 DEBUG worker.BspServiceWorker: 10
Worker(hostname=ip-172-31-29-15.ec2.internal, MRtaskID=2, port=30002)
16/08/26 00:43:08 DEBUG worker.BspServiceWorker: 11
Worker(hostname=ip-172-31-29-14.ec2.internal, MRtaskID=4, port=30004)
16/08/26 00:43:08 DEBUG worker.BspServiceWorker: 12
Worker(hostname=ip-172-31-29-14.ec2.internal, MRtaskID=0, port=30000)
16/08/26 00:43:08 DEBUG worker.BspServiceWorker: 13
Worker(hostname=ip-172-31-29-16.ec2.internal, MRtaskID=1, port=30001)
16/08/26 00:43:08 DEBUG worker.BspServiceWorker: 14
Worker(hostname=ip-172-31-29-15.ec2.internal, MRtaskID=2, port=30002)
16/08/26 00:43:08 DEBUG worker.BspServiceWorker: 15
Worker(hostname=ip-172-31-29-14.ec2.internal, MRtaskID=4, port=30004)
16/08/26 00:43:08 DEBUG graph.GraphTaskManager: execute: Memory
(free/total/max) = 92421.41M / 115000.00M / 115000.00M


I don't know what is exactly failing:
- i know that all containers have memory available, on datanodes i check
that each one had like 50 GB available.
- I'm not sure if i'm hitting some sort of limit in the use of outOfCore. I
know that writing messages too fast is dangerous with 1.1 version of
Giraph, but if i hit that limit, i suppose that the container will fail,
right?
- Maybe the connections for zookeeper client aren't enough? I read that
maybe the 60 default value in zookeeper for *maxClientCnxns* is too small
for a context like AWS, but i'm not fully aware of the relationship between
Giraph and Zookeeper for start changing default configuration values
- Maybe i have to tune outOfCore configuration? Using
giraph.maxNumberOfOpenRequests and giraph.waitForRequestsConfirmation=true
like someone recommend here (
http://mail-archives.apache.org/mod_mbox/giraph-user/201209.mbox/%3CCC775449.2C4B%25majakabiljo@fb.com%3E)
?
- Should i tune the netty configuration? I have the default configuration,
but i believe that maybe using only 8 netty client and 8 server threads
will be enough, since that i have only a few workers and maybe too much
threads of netty are making the overhead that is doing that entire
application get stuck
- Using giraph.useBigDataIOForMessages=true didn't help me either, i know
that each vertex is receiving 100 M or more messages and that property
should be helpful, but didn't make any difference anyway

As you maybe are suspecting, i have too many hypothesis, that's why i'm
seeking for help, so i can go in the right direction.

Any help would be greatly appreciated.

Bye!
Jose

--001a11407d0aa566d6053b02a846
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div><div><div><div><div><div><div><div><div><div><div><di=
v>Hi again guys!<br><br>I&#39;m doing BFS search through the Wikipedia (spa=
nish edition) site. I=20
converted the <a href=3D"https://dumps.wikimedia.org/eswiki/20160601/">dump=
</a> (<a href=3D"https://dumps.wikimedia.org/eswiki/20160601">https://dumps=
.wikimedia.org/eswiki/20160601</a>) into a file that could be read with Gir=
aph.<br><br></div><div>The BFS is searching for paths, and its all ok until=
 get stuck in some point of the superstep four.<br></div><div><br></div><di=
v>I&#39;m using a cluster of 5 nodes (4 slaves core, 1 Master) on AWS. Each=
 node is a r3.8xlarge ec2 instance. The command for executing the BFS is th=
is one:<br></div><div><span style=3D"font-size:10.6667px;font-family:Arial;=
color:rgb(0,0,0);background-color:transparent;font-weight:400;font-style:no=
rmal;font-variant:normal;text-decoration:none;vertical-align:baseline" id=
=3D"docs-internal-guid-98894091-c941-c0f5-91db-d67c091de6ce">/home/hadoop/b=
in/yarn jar /home/hadoop/giraph/giraph.jar ar.edu.info.unlp.tesina.lectura.=
grafo.BusquedaDeCaminosNavegacionalesWikiquote</span><span style=3D"font-si=
ze:10.6667px;font-family:Arial;color:rgb(0,0,0);background-color:transparen=
t;font-weight:400;font-style:normal;font-variant:normal;text-decoration:non=
e;vertical-align:baseline" id=3D"docs-internal-guid-98894091-c95a-1588-5987=
-f576d37641c9"> -vif ar.edu.info.unlp.tesina.vertice.estructuras.IdTextWith=
ComplexValueInputFormat -vip /user/hduser/input/grafo-wikipedia.txt -vof ar=
.edu.info.unlp.tesina.vertice.estructuras.IdTextWithComplexValueOutputForma=
t -op /user/hduser/output/caminosNavegacionales -w 4 -yh 120000 -ca giraph.=
useOutOfCoreMessages=3Dtrue,giraph.metrics.enable=3Dtrue,giraph.maxMessages=
InMemory=3D1000000000,giraph.isStaticGraph=3Dtrue,</span><span style=3D"fon=
t-size:10.6667px;font-family:Arial;color:rgb(34,34,34);background-color:rgb=
(255,255,255);font-weight:400;font-style:normal;font-variant:normal;text-de=
coration:none;vertical-align:baseline"><font size=3D"2"><b>giraph.logLevel=
=3DDebug</b></font></span><br><br></div><div>Each container have 120GB (alm=
ost). I&#39;m using 1000M messages limit in outOfCore, because i believed t=
hat was the problem, but=C2=A0 apparently is not.<br></div><div><br></div>T=
his ones are the master logs (it seems that is waiting for workers for fini=
sh but they just don&#39;t...and keeps like this forever...):<br><br><span =
style=3D"font-size:10.6667px;font-family:Arial;color:rgb(34,34,34);backgrou=
nd-color:rgb(255,255,255);font-weight:400;font-style:normal;font-variant:no=
rmal;text-decoration:none;vertical-align:baseline" id=3D"docs-internal-guid=
-98894091-c936-3f74-b104-1302865f296d">6/08/26 00:43:08 INFO yarn.GiraphYar=
nTask: [STATUS: task-3] MASTER_ZOOKEEPER_ONLY - 0 finished out of 4 on supe=
rstep 4</span><span style=3D"font-size:10.6667px;font-family:Arial;color:rg=
b(34,34,34);background-color:rgb(255,255,255);font-weight:400;font-style:no=
rmal;font-variant:normal;text-decoration:none;vertical-align:baseline"><br =
class=3D""></span><span style=3D"font-size:10.6667px;font-family:Arial;colo=
r:rgb(34,34,34);background-color:rgb(255,255,255);font-weight:400;font-styl=
e:normal;font-variant:normal;text-decoration:none;vertical-align:baseline">=
16/08/26 00:43:08 DEBUG master.BspServiceMaster: barrierOnWorkerList: Got f=
inished worker list =3D [], size =3D 0, worker list =3D [Worker(hostname=3D=
ip-172-31-29-14.ec2.internal, MRtaskID=3D0, port=3D30000), Worker(hostname=
=3Dip-172-31-29-16.ec2.internal, MRtaskID=3D1, port=3D30001), Worker(hostna=
me=3Dip-172-31-29-15.ec2.internal, MRtaskID=3D2, port=3D30002), Worker(host=
name=3Dip-172-31-29-14.ec2.internal, MRtaskID=3D4, port=3D30004)], size =3D=
 4 from /_hadoopBsp/giraph_yarn_application_1472168758138_0002/_application=
AttemptsDir/0/_superstepDir/4/_workerFinishedDir</span><span style=3D"font-=
size:10.6667px;font-family:Arial;color:rgb(34,34,34);background-color:rgb(2=
55,255,255);font-weight:400;font-style:normal;font-variant:normal;text-deco=
ration:none;vertical-align:baseline"><br class=3D""></span><span style=3D"f=
ont-size:10.6667px;font-family:Arial;color:rgb(34,34,34);background-color:r=
gb(255,255,255);font-weight:400;font-style:normal;font-variant:normal;text-=
decoration:none;vertical-align:baseline">16/08/26 00:43:08 INFO yarn.Giraph=
YarnTask: [STATUS: task-3] MASTER_ZOOKEEPER_ONLY - 0 finished out of 4 on s=
uperstep 4</span><span style=3D"font-size:10.6667px;font-family:Arial;color=
:rgb(34,34,34);background-color:rgb(255,255,255);font-weight:400;font-style=
:normal;font-variant:normal;text-decoration:none;vertical-align:baseline"><=
br class=3D""></span><b><span style=3D"font-size:10.6667px;font-family:Aria=
l;color:rgb(34,34,34);background-color:rgb(255,255,255);font-weight:400;fon=
t-style:normal;font-variant:normal;text-decoration:none;vertical-align:base=
line">16/08/26 00:43:08 DEBUG zk.PredicateLock: waitMsecs: Wait for 10000</=
span><span style=3D"font-size:10.6667px;font-family:Arial;color:rgb(34,34,3=
4);background-color:rgb(255,255,255);font-weight:400;font-style:normal;font=
-variant:normal;text-decoration:none;vertical-align:baseline"><br class=3D"=
"></span><span style=3D"font-size:10.6667px;font-family:Arial;color:rgb(34,=
34,34);background-color:rgb(255,255,255);font-weight:400;font-style:normal;=
font-variant:normal;text-decoration:none;vertical-align:baseline">16/08/26 =
00:43:18 DEBUG zk.PredicateLock: waitMsecs: Got timed signaled of false</sp=
an></b><span style=3D"font-size:10.6667px;font-family:Arial;color:rgb(34,34=
,34);background-color:rgb(255,255,255);font-weight:400;font-style:normal;fo=
nt-variant:normal;text-decoration:none;vertical-align:baseline"><br class=
=3D""></span><span style=3D"font-size:10.6667px;font-family:Arial;color:rgb=
(34,34,34);background-color:rgb(255,255,255);font-weight:400;font-style:nor=
mal;font-variant:normal;text-decoration:none;vertical-align:baseline">...th=
irty times same last two lines...<br>...<br></span><span style=3D"font-size=
:10.6667px;font-family:Arial;color:rgb(34,34,34);background-color:rgb(255,2=
55,255);font-weight:400;font-style:normal;font-variant:normal;text-decorati=
on:none;vertical-align:baseline" id=3D"docs-internal-guid-98894091-c936-3f7=
4-b104-1302865f296d">6/08/26 00:43:08 INFO yarn.GiraphYarnTask: [STATUS: ta=
sk-3] MASTER_ZOOKEEPER_ONLY - 0 finished out of 4 on superstep 4</span><spa=
n style=3D"font-size:10.6667px;font-family:Arial;color:rgb(34,34,34);backgr=
ound-color:rgb(255,255,255);font-weight:400;font-style:normal;font-variant:=
normal;text-decoration:none;vertical-align:baseline"><br class=3D""></span>=
<span style=3D"font-size:10.6667px;font-family:Arial;color:rgb(34,34,34);ba=
ckground-color:rgb(255,255,255);font-weight:400;font-style:normal;font-vari=
ant:normal;text-decoration:none;vertical-align:baseline">16/08/26
 00:43:08 DEBUG master.BspServiceMaster: barrierOnWorkerList: Got=20
finished worker list =3D [], size =3D 0, worker list =3D=20
[Worker(hostname=3Dip-172-31-29-14.ec2.internal, MRtaskID=3D0, port=3D30000=
),=20
Worker(hostname=3Dip-172-31-29-16.ec2.internal, MRtaskID=3D1, port=3D30001)=
,=20
Worker(hostname=3Dip-172-31-29-15.ec2.internal, MRtaskID=3D2, port=3D30002)=
,=20
Worker(hostname=3Dip-172-31-29-14.ec2.internal, MRtaskID=3D4, port=3D30004)=
],=20
size =3D 4 from=20
/_hadoopBsp/giraph_yarn_application_1472168758138_0002/_applicationAttempts=
Dir/0/_superstepDir/4/_workerFinishedDir</span><span style=3D"font-size:10.=
6667px;font-family:Arial;color:rgb(34,34,34);background-color:rgb(255,255,2=
55);font-weight:400;font-style:normal;font-variant:normal;text-decoration:n=
one;vertical-align:baseline"><br class=3D""></span><span style=3D"font-size=
:10.6667px;font-family:Arial;color:rgb(34,34,34);background-color:rgb(255,2=
55,255);font-weight:400;font-style:normal;font-variant:normal;text-decorati=
on:none;vertical-align:baseline">16/08/26 00:43:08 INFO yarn.GiraphYarnTask=
: [STATUS: task-3] MASTER_ZOOKEEPER_ONLY - 0 finished out of 4 on superstep=
 4</span><span style=3D"font-size:10.6667px;font-family:Arial;color:rgb(34,=
34,34);background-color:rgb(255,255,255);font-weight:400;font-style:normal;=
font-variant:normal;text-decoration:none;vertical-align:baseline"><br><br><=
/span></div><span style=3D"font-size:10.6667px;font-family:Arial;color:rgb(=
34,34,34);background-color:rgb(255,255,255);font-weight:400;font-style:norm=
al;font-variant:normal;text-decoration:none;vertical-align:baseline"><font =
size=3D"2">And in <b>all</b> workers, there is no information on what is ha=
ppening (i&#39;m testing this with <b>giraph.logLevel=3DDebug</b> because w=
ith the default level of giraph log i was lost), and the workers say this o=
ver and over again:</font><br><br>16/08/26 01:05:08 INFO utils.Progressable=
Utils: waitFor: Future result not ready yet java.util.concurrent.FutureTask=
@7392f34d<br>16/08/26 01:05:08 INFO utils.ProgressableUtils: waitFor: Waiti=
ng for org.apache.giraph.utils.ProgressableUtils$FutureWaitable@34a37f82<br=
><br></span></div><div><span style=3D"font-size:10.6667px;font-family:Arial=
;color:rgb(34,34,34);background-color:rgb(255,255,255);font-weight:400;font=
-style:normal;font-variant:normal;text-decoration:none;vertical-align:basel=
ine"><font size=3D"2">Before starting the superstep 4, the information on e=
ach worker was the following one</font><br></span></div><div><span style=3D=
"font-size:10.6667px;font-family:Arial;color:rgb(34,34,34);background-color=
:rgb(255,255,255);font-weight:400;font-style:normal;font-variant:normal;tex=
t-decoration:none;vertical-align:baseline"><font size=3D"1">16/08/26 00:43:=
08 INFO yarn.GiraphYarnTask: [STATUS: task-2] startSuperstep: WORKER_ONLY -=
 Attempt=3D0, Superstep=3D4<br>16/08/26 00:43:08 DEBUG worker.BspServiceWor=
ker: startSuperstep: addressesAndPartitions[Worker(hostname=3Dip-172-31-29-=
14.ec2.internal, MRtaskID=3D0, port=3D30000), Worker(hostname=3Dip-172-31-2=
9-16.ec2.internal, MRtaskID<br>=3D1, port=3D30001), Worker(hostname=3Dip-17=
2-31-29-15.ec2.internal, MRtaskID=3D2, port=3D30002), Worker(hostname=3Dip-=
172-31-29-14.ec2.internal, MRtaskID=3D4, port=3D30004)]<br>16/08/26 00:43:0=
8 DEBUG worker.BspServiceWorker: 0 Worker(hostname=3Dip-172-31-29-14.ec2.in=
ternal, MRtaskID=3D0, port=3D30000)<br>16/08/26 00:43:08 DEBUG worker.BspSe=
rviceWorker: 1 Worker(hostname=3Dip-172-31-29-16.ec2.internal, MRtaskID=3D1=
, port=3D30001)<br>16/08/26 00:43:08 DEBUG worker.BspServiceWorker: 2 Worke=
r(hostname=3Dip-172-31-29-15.ec2.internal, MRtaskID=3D2, port=3D30002)<br>1=
6/08/26 00:43:08 DEBUG worker.BspServiceWorker: 3 Worker(hostname=3Dip-172-=
31-29-14.ec2.internal, MRtaskID=3D4, port=3D30004)<br>16/08/26 00:43:08 DEB=
UG worker.BspServiceWorker: 4 Worker(hostname=3Dip-172-31-29-14.ec2.interna=
l, MRtaskID=3D0, port=3D30000)<br>16/08/26 00:43:08 DEBUG worker.BspService=
Worker: 5 Worker(hostname=3Dip-172-31-29-16.ec2.internal, MRtaskID=3D1, por=
t=3D30001)<br>16/08/26 00:43:08 DEBUG worker.BspServiceWorker: 6 Worker(hos=
tname=3Dip-172-31-29-15.ec2.internal, MRtaskID=3D2, port=3D30002)<br>16/08/=
26 00:43:08 DEBUG worker.BspServiceWorker: 7 Worker(hostname=3Dip-172-31-29=
-14.ec2.internal, MRtaskID=3D4, port=3D30004)<br>16/08/26 00:43:08 DEBUG wo=
rker.BspServiceWorker: 8 Worker(hostname=3Dip-172-31-29-14.ec2.internal, MR=
taskID=3D0, port=3D30000)<br>16/08/26 00:43:08 DEBUG worker.BspServiceWorke=
r: 9 Worker(hostname=3Dip-172-31-29-16.ec2.internal, MRtaskID=3D1, port=3D3=
0001)<br>16/08/26 00:43:08 DEBUG worker.BspServiceWorker: 10 Worker(hostnam=
e=3Dip-172-31-29-15.ec2.internal, MRtaskID=3D2, port=3D30002)<br>16/08/26 0=
0:43:08 DEBUG worker.BspServiceWorker: 11 Worker(hostname=3Dip-172-31-29-14=
.ec2.internal, MRtaskID=3D4, port=3D30004)<br>16/08/26 00:43:08 DEBUG worke=
r.BspServiceWorker: 12 Worker(hostname=3Dip-172-31-29-14.ec2.internal, MRta=
skID=3D0, port=3D30000)<br>16/08/26 00:43:08 DEBUG worker.BspServiceWorker:=
 13 Worker(hostname=3Dip-172-31-29-16.ec2.internal, MRtaskID=3D1, port=3D30=
001)<br>16/08/26 00:43:08 DEBUG worker.BspServiceWorker: 14 Worker(hostname=
=3Dip-172-31-29-15.ec2.internal, MRtaskID=3D2, port=3D30002)<br>16/08/26 00=
:43:08 DEBUG worker.BspServiceWorker: 15 Worker(hostname=3Dip-172-31-29-14.=
ec2.internal, MRtaskID=3D4, port=3D30004)<br>16/08/26 00:43:08 DEBUG graph.=
GraphTaskManager: execute: Memory (free/total/max) =3D 92421.41M / 115000.0=
0M / 115000.00M</font><br><br><br></span></div><span style=3D"font-size:10.=
6667px;font-family:Arial;color:rgb(34,34,34);background-color:rgb(255,255,2=
55);font-weight:400;font-style:normal;font-variant:normal;text-decoration:n=
one;vertical-align:baseline"><font size=3D"2">I don&#39;t know what is exac=
tly failing:<br></font></span></div><font size=3D"2"><span style=3D"font-fa=
mily:Arial;color:rgb(34,34,34);background-color:rgb(255,255,255);font-weigh=
t:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-a=
lign:baseline">- i know that all containers have memory available, on datan=
odes i check that each one had like 50 GB available.<br></span></font></div=
><font size=3D"2"><span style=3D"font-family:Arial;color:rgb(34,34,34);back=
ground-color:rgb(255,255,255);font-weight:400;font-style:normal;font-varian=
t:normal;text-decoration:none;vertical-align:baseline">- I&#39;m not sure i=
f i&#39;m hitting some sort of limit in the use of outOfCore. I know that w=
riting messages too fast is dangerous with 1.1 version of Giraph, but if i =
hit that limit, i suppose that the container will fail, right? <br></span><=
/font></div><font size=3D"2"><span style=3D"font-family:Arial;color:rgb(34,=
34,34);background-color:rgb(255,255,255);font-weight:400;font-style:normal;=
font-variant:normal;text-decoration:none;vertical-align:baseline">- Maybe t=
he connections for zookeeper client aren&#39;t enough? I read that maybe th=
e 60 default value in zookeeper for <b>maxClientCnxns</b> is too small for =
a context like AWS, but i&#39;m not fully aware of the relationship between=
 Giraph and Zookeeper for start changing default configuration values<br></=
span></font></div><font size=3D"2"><span style=3D"font-family:Arial;color:r=
gb(34,34,34);background-color:rgb(255,255,255);font-weight:400;font-style:n=
ormal;font-variant:normal;text-decoration:none;vertical-align:baseline">- M=
aybe i have to tune outOfCore configuration? Using giraph.maxNumberOfOpenRe=
quests and</span> giraph.waitForRequestsConfirmation=3Dtrue<span style=3D"f=
ont-family:Arial;color:rgb(34,34,34);background-color:rgb(255,255,255);font=
-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vert=
ical-align:baseline"> like someone recommend here (</span><span style=3D"fo=
nt-family:Arial;color:rgb(34,34,34);background-color:rgb(255,255,255);font-=
weight:400;font-style:normal;font-variant:normal;text-decoration:none;verti=
cal-align:baseline"><span style=3D"font-family:Arial;color:rgb(17,85,204);b=
ackground-color:transparent;font-weight:400;font-style:normal;font-variant:=
normal;text-decoration:underline;vertical-align:baseline"><a href=3D"http:/=
/mail-archives.apache.org/mod_mbox/giraph-user/201209.mbox/%3CCC775449.2C4B=
%25majakabiljo@fb.com%3E">http://mail-archives.apache.org/mod_mbox/giraph-u=
ser/201209.mbox/%3CCC775449.2C4B%25majakabiljo@fb.com%3E</a></span>) ?<br><=
/span></font></div><font size=3D"2"><span style=3D"font-family:Arial;color:=
rgb(34,34,34);background-color:rgb(255,255,255);font-weight:400;font-style:=
normal;font-variant:normal;text-decoration:none;vertical-align:baseline">- =
Should i tune the netty configuration? I have the default configuration, bu=
t i believe that maybe using only 8 netty client and 8 server threads will =
be enough, since that i have only a few workers and maybe too much threads =
of netty are making the overhead that is doing that entire application get =
stuck<br></span></font></div><div><font size=3D"2"><span style=3D"font-fami=
ly:Arial;color:rgb(34,34,34);background-color:rgb(255,255,255);font-weight:=
400;font-style:normal;font-variant:normal;text-decoration:none;vertical-ali=
gn:baseline">- Using </span></font><span style=3D"font-size:10.6667px;font-=
family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:400;=
font-style:normal;font-variant:normal;text-decoration:none;vertical-align:b=
aseline" id=3D"docs-internal-guid-98894091-c957-54f2-8d73-0297701f9d58"><fo=
nt size=3D"2">giraph.useBigDataIOForMessages=3Dtrue didn&#39;t help me eith=
er, i know that each vertex is receiving 100 M or more messages and that pr=
operty should be helpful, but didn&#39;t make any difference anyway</font><=
br></span></div><div><span style=3D"font-size:10.6667px;font-family:Arial;c=
olor:rgb(34,34,34);background-color:rgb(255,255,255);font-weight:400;font-s=
tyle:normal;font-variant:normal;text-decoration:none;vertical-align:baselin=
e"><font size=3D"2"><br></font></span></div><span style=3D"font-size:10.666=
7px;font-family:Arial;color:rgb(34,34,34);background-color:rgb(255,255,255)=
;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none=
;vertical-align:baseline"><font size=3D"2">As you maybe are suspecting, i h=
ave too many hypothesis, that&#39;s why i&#39;m seeking for help, so i can =
go in the right direction.<br><br></font></span></div><span style=3D"font-s=
ize:10.6667px;font-family:Arial;color:rgb(34,34,34);background-color:rgb(25=
5,255,255);font-weight:400;font-style:normal;font-variant:normal;text-decor=
ation:none;vertical-align:baseline"><font size=3D"2">Any help would be grea=
tly appreciated.<br><br></font></span></div><span style=3D"font-size:10.666=
7px;font-family:Arial;color:rgb(34,34,34);background-color:rgb(255,255,255)=
;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none=
;vertical-align:baseline"><font size=3D"2">Bye!<br></font></span></div><spa=
n style=3D"font-size:10.6667px;font-family:Arial;color:rgb(34,34,34);backgr=
ound-color:rgb(255,255,255);font-weight:400;font-style:normal;font-variant:=
normal;text-decoration:none;vertical-align:baseline"><font size=3D"2">Jose<=
br></font></span><div><div><div><div><div><span style=3D"font-size:10.6667p=
x;font-family:Arial;color:rgb(34,34,34);background-color:rgb(255,255,255);f=
ont-weight:400;font-style:normal;font-variant:normal;text-decoration:none;v=
ertical-align:baseline"><font size=3D"2"><br><br></font></span><div><div><d=
iv><div><div><div><div><div><div><div><span style=3D"font-size:10.6667px;fo=
nt-family:Arial;color:rgb(34,34,34);background-color:rgb(255,255,255);font-=
weight:400;font-style:normal;font-variant:normal;text-decoration:none;verti=
cal-align:baseline"><br class=3D""></span><div><br></div></div></div></div>=
</div></div></div></div></div></div></div></div></div></div></div></div></d=
iv>

--001a11407d0aa566d6053b02a846--