Mailing-List: contact user-help@flink.apache.org; run by ezmlm
Precedence: bulk
MIME-Version: 1.0
In-Reply-To: <CANC1h_tFGoQ+bok=LVabh6c07nH8gqGRyJ4z83XkXNRCG=kFaA@mail.gmail.com>
References: <CAELUF_Ct+qvEc1BxuWg6wDP3u3XpXarh6q6ueS-vg0Yuh0QQqg@mail.gmail.com>
 <CAELUF_Chm2Y73jx+FtJit7rvQbS_UWxBZPWrVP28FGGVfupvsw@mail.gmail.com>
 <BA40D457-820E-465B-9EA4-4E8E62E3103C@apache.org> <71247100.Bo8u7n7Q2U@nico-work>
 <CAELUF_C+aOdCseB20-KG244ChF6F_yHRbJqgp3G6gOLM5gViWg@mail.gmail.com>
 <CAAdrtT2SNoVrjyPaKaqOe-Whg_cJ0hQxHRA5a3Qo=_Y_kYjp=Q@mail.gmail.com> <CANC1h_tFGoQ+bok=LVabh6c07nH8gqGRyJ4z83XkXNRCG=kFaA@mail.gmail.com>
From: Flavio Pompermaier <pompermaier@okkam.it>
Date: Tue, 6 Jun 2017 19:42:12 +0200
Message-ID: <CAELUF_DbobfGMD8p+m4Yvvx6=hcC4ahpViqXWprHo9gFpKLi7Q@mail.gmail.com>
Subject: Re: Flink and swapping question
To: Stephan Ewen <sewen@apache.org>
Cc: Greg Hogan <code@greghogan.com>, Aljoscha Krettek <aljoscha@apache.org>,
	Fabian Hueske <fhueske@gmail.com>, user <user@flink.apache.org>,
	Nico Kruber <nico@data-artisans.com>
Content-Type: multipart/alternative; boundary="94eb2c083e22c3321405514e23ee"
archived-at: Tue, 06 Jun 2017 17:42:21 -0000

--94eb2c083e22c3321405514e23ee
Content-Type: text/plain; charset="UTF-8"

Hi Stephan,
I also think that the error is more related to netty.
The only suspicious library I use are parquet or thrift.
I'm not using off-heap memory.
What do you mean for "crazy high number of concurrent network shuffles"?how
can I count that?
We're using java 8.

Thanks a lot,
Flavio


On 6 Jun 2017 7:13 pm, "Stephan Ewen" <sewen@apache.org> wrote:

Hi!

I would actually be surprised if this is an issue in core Flink.

  - The MaxDirectMemory parameter is pretty meaningless, it really is a max
and does not have an impact on how much is actually allocated.

  - In most cases we had reported so far, the leak was in a library that
was used in the user code

  - If you do not use offheap memory in Flink, then there are few other
culprits that can cause high virtual memory consumption:
      - Netty, if you bumped the Netty version in a custom build
      - Flink's Netty, if the job has a crazy high number of concurrent
network shuffles (we are talking 1000s here)
      - Some old Java versions have I/O memory leaks (I think some older
Java 6 and Java 7 versions were affected)


To diagnose that better:

  - Are these batch or streaming jobs?
  - If it is streaming, which state backend are you using?

Stephan


On Tue, Jun 6, 2017 at 12:00 PM, Fabian Hueske <fhueske@gmail.com> wrote:

> Hi Flavio,
>
> can you post the all memory configuration parameters of your workers?
> Did you investigate which whether the direct or heap memory grew?
>
> Thanks, Fabian
>
> 2017-05-29 20:53 GMT+02:00 Flavio Pompermaier <pompermaier@okkam.it>:
>
>> Hi to all,
>> I'm still trying to understand what's going on our production Flink
>> cluster.
>> The facts are:
>>
>> 1. The Flink cluster runs on 5 VMWare VMs managed by ESXi
>> 2. On a specific  job we have, without limiting the direct memory to 5g,
>> the TM gets killed by the OS almost immediately because the memory required
>> by the TM, at some point, becomes huge, like > 100 GB (others jobs seem to
>> be less affected by the problem )
>> 3. Although the memory consumption is much better this way, the Flink TM
>> memory continuously grow job after job (of this problematic type): we set
>> TM max heap to 14 GB and the JVM required memory can be ~ 30 Gb. How is
>> that possible?
>>
>> My fear is that there's some annoying memory leak / bad memory allocation
>> in the Flink network level, but I can't have any evidence of this (except
>> the fact that the vm which doesn't have a hdfs datanode underneath the
>> Flink TM is the one with the biggest TM virtual memory consumption).
>>
>> Thanks for the help ,
>> Flavio
>>
>> On 29 May 2017 15:37, "Nico Kruber" <nico@data-artisans.com> wrote:
>>
>>> FYI: taskmanager.sh sets this parameter but also states the following:
>>>
>>>   # Long.MAX_VALUE in TB: This is an upper bound, much less direct
>>> memory will
>>> be used
>>>   TM_MAX_OFFHEAP_SIZE="8388607T"
>>>
>>>
>>> Nico
>>>
>>> On Monday, 29 May 2017 15:19:47 CEST Aljoscha Krettek wrote:
>>> > Hi Flavio,
>>> >
>>> > Is this running on YARN or bare metal? Did you manage to find out
>>> where this
>>> > insanely large parameter is coming from?
>>> >
>>> > Best,
>>> > Aljoscha
>>> >
>>> > > On 25. May 2017, at 19:36, Flavio Pompermaier <pompermaier@okkam.it>
>>> > > wrote:
>>> > >
>>> > > Hi to all,
>>> > > I think we found the root cause of all the problems. Looking ad dmesg
>>> > > there was a "crazy" total-vm size associated to the OOM error, a LOT
>>> much
>>> > > bigger than the TaskManager's available memory. In our case, the TM
>>> had a
>>> > > max heap of 14 GB while the dmsg error was reporting a required
>>> amount of
>>> > > memory in the order of 60 GB!
>>> > >
>>> > > [ 5331.992539] Out of memory: Kill process 24221 (java) score 937 or
>>> > > sacrifice child [ 5331.992619] Killed process 24221 (java)
>>> > > total-vm:64800680kB, anon-rss:31387544kB, file-rss:6064kB,
>>> shmem-rss:0kB
>>> > >
>>> > > That wasn't definitively possible usin an ordinary JVM (and our TM
>>> was
>>> > > running without off-heap settings) so we've looked at the parameters
>>> used
>>> > > to run the TM JVM and indeed there was a reall huge amount of memory
>>> > > given to MaxDirectMemorySize. With my big surprise Flink runs a TM
>>> with
>>> > > this parameter set to 8.388.607T..does it make any sense?? Is it
>>> > > documented anywhere the importance of this parameter (and why it is
>>> used
>>> > > in non off-heap mode as well)? Is it related to network buffers? It
>>> > > should also be documented that this parameter should be added to the
>>> TM
>>> > > heap when reserving memory to Flin (IMHO).
>>> > >
>>> > > I hope that this painful sessions of Flink troubleshooting could be
>>> an
>>> > > added value sooner or later..
>>> > >
>>> > > Best,
>>> > > Flavio
>>> > >
>>> > > On Thu, May 25, 2017 at 10:21 AM, Flavio Pompermaier <
>>> pompermaier@okkam.it
>>> > > <mailto:pompermaier@okkam.it>> wrote: I can confirm that after
>>> giving
>>> > > less memory to the Flink TM the job was able to run successfully.
>>> After
>>> > > almost 2 weeks of pain, we summarize here our experience with Fink in
>>> > > virtualized environments (such as VMWare ESXi): Disable the
>>> > > virtualization "feature" that transfer a VM from a (heavy loaded)
>>> > > physical machine to another one (to balance the resource consumption)
>>> > > Check dmesg when a TM dies without logging anything (usually it goes
>>> OOM
>>> > > and the OS kills it but there you can find the log of this thing)
>>> CentOS
>>> > > 7 on ESXi seems to start swapping VERY early (in my case I see the OS
>>> > > starting swapping also if there are 12 out of 32 GB of free memory)!
>>> > > We're still investigating how this behavior could be fixed: the
>>> problem
>>> > > is that it's better not to disable swapping because otherwise VMWare
>>> > > could start ballooning (that is definitely worse...).
>>> > >
>>> > > I hope this tips could save someone else's day..
>>> > >
>>> > > Best,
>>> > > Flavio
>>> > >
>>> > > On Wed, May 24, 2017 at 4:28 PM, Flavio Pompermaier <
>>> pompermaier@okkam.it
>>> > > <mailto:pompermaier@okkam.it>> wrote: Hi Greg, you were right! After
>>> > > typing dmsg I found "Out of memory: Kill process 13574 (java)". This
>>> is
>>> > > really strange because the JVM of the TM is very calm.
>>> > > Moreover, there are 7 GB of memory available (out of 32) but somehow
>>> the
>>> > > OS decides to start swapping and, when it runs out of available swap
>>> > > memory, the OS decides to kill the Flink TM :(
>>> > >
>>> > > Any idea of what's going on here?
>>> > >
>>> > > On Wed, May 24, 2017 at 2:32 PM, Flavio Pompermaier <
>>> pompermaier@okkam.it
>>> > > <mailto:pompermaier@okkam.it>> wrote: Hi Greg,
>>> > > I carefully monitored all TM memory with jstat -gcutil and there'no
>>> full
>>> > > gc, only .>
>>> > > The initial situation on the dying TM is:
>>> > >   S0     S1     E      O      M     CCS    YGC     YGCT    FGC
>>> FGCT
>>> > >   GCT 0.00 100.00  33.57  88.74  98.42  97.17    159    2.508     1
>>> > >   0.255    2.763 0.00 100.00  90.14  88.80  98.67  97.17    197
>>> 2.617
>>> > >      1    0.255    2.873 0.00 100.00  27.00  88.82  98.75  97.17
>>> 234
>>> > >    2.730     1    0.255    2.986>
>>> > > After about 10 hours of processing is:
>>> > >   0.00 100.00  21.74  83.66  98.52  96.94   5519   33.011     1
>>> 0.255
>>> > >   33.267 0.00 100.00  21.74  83.66  98.52  96.94   5519   33.011
>>>  1
>>> > >   0.255   33.267 0.00 100.00  21.74  83.66  98.52  96.94   5519
>>>  33.011
>>> > >      1    0.255   33.267>
>>> > > So I don't think thta OOM could be an option.
>>> > >
>>> > > However, the cluster is running on ESXi vSphere VMs and we already
>>> > > experienced unexpected crash of jobs because of ESXi moving a
>>> > > heavy-loaded VM to another (less loaded) physical machine..I would't
>>> be
>>> > > surprised if swapping is also handled somehow differently.. Looking
>>> at
>>> > > Cloudera widgets I see that the crash is usually preceded by an
>>> intense
>>> > > cpu_iowait period. I fear that Flink unsafe access to memory could
>>> be a
>>> > > problem in those scenarios. Am I wrong?
>>> > >
>>> > > Any insight or debugging technique is  greatly appreciated.
>>> > > Best,
>>> > > Flavio
>>> > >
>>> > >
>>> > > On Wed, May 24, 2017 at 2:11 PM, Greg Hogan <code@greghogan.com
>>> > > <mailto:code@greghogan.com>> wrote: Hi Flavio,
>>> > >
>>> > > Flink handles interrupts so the only silent killer I am aware of is
>>> > > Linux's OOM killer. Are you seeing such a message in dmesg?
>>> > >
>>> > > Greg
>>> > >
>>> > > On Wed, May 24, 2017 at 3:18 AM, Flavio Pompermaier <
>>> pompermaier@okkam.it
>>> > > <mailto:pompermaier@okkam.it>> wrote: Hi to all,
>>> > > I'd like to know whether memory swapping could cause a taskmanager
>>> crash.
>>> > > In my cluster of virtual machines 'm seeing this strange behavior in
>>> my
>>> > > Flink cluster: sometimes, if memory get swapped the taskmanager (on
>>> that
>>> > > machine) dies unexpectedly without any log about the error.
>>> > >
>>> > > Is that possible or not?
>>> > >
>>> > > Best,
>>> > > Flavio
>>>
>>>
>

--94eb2c083e22c3321405514e23ee
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"auto"><div>Hi Stephan,</div><div dir=3D"auto">I also think that=
 the error is more related to netty.</div><div dir=3D"auto">The only suspic=
ious library I use are parquet or thrift.</div><div dir=3D"auto">I&#39;m no=
t using off-heap memory.</div><div dir=3D"auto">What do you mean for &quot;=
c<span style=3D"font-family:sans-serif">razy high number of concurrent netw=
ork shuffles&quot;?how can I count that?</span></div><div dir=3D"auto"><fon=
t face=3D"sans-serif">We&#39;re using java 8.</font></div><div dir=3D"auto"=
><font face=3D"sans-serif"><br></font></div><div dir=3D"auto"><font face=3D=
"sans-serif">Thanks a lot,</font></div><div dir=3D"auto"><font face=3D"sans=
-serif">Flavio</font></div><div dir=3D"auto"><div dir=3D"auto"><br></div><b=
r><div class=3D"gmail_extra" dir=3D"auto"><br><div class=3D"gmail_quote">On=
 6 Jun 2017 7:13 pm, &quot;Stephan Ewen&quot; &lt;<a href=3D"mailto:sewen@a=
pache.org">sewen@apache.org</a>&gt; wrote:<br type=3D"attribution"><blockqu=
ote class=3D"quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;p=
adding-left:1ex"><div dir=3D"ltr">Hi!<div><br></div><div>I would actually b=
e surprised if this is an issue in core Flink.</div><div><br></div><div>=C2=
=A0 - The MaxDirectMemory parameter is pretty meaningless, it really is a m=
ax and does not have an impact on how much is actually allocated.</div><div=
><br></div><div>=C2=A0 - In most cases we had reported so far, the leak was=
 in a library that was used in the user code<br></div><div><br></div><div>=
=C2=A0 - If you do not use offheap memory in Flink, then there are few othe=
r culprits that can cause high virtual memory consumption:</div><div>=C2=A0=
 =C2=A0 =C2=A0 - Netty, if you bumped the Netty version in a custom build</=
div><div>=C2=A0 =C2=A0 =C2=A0 - Flink&#39;s Netty, if the job has a crazy h=
igh number of concurrent network shuffles (we are talking 1000s here)</div>=
<div>=C2=A0 =C2=A0 =C2=A0 - Some old Java versions have I/O memory leaks (I=
 think some older Java 6 and Java 7 versions were affected)</div><div><br><=
/div><div><br></div><div>To diagnose that better:</div><div><br></div><div>=
=C2=A0 - Are these batch or streaming jobs?=C2=A0</div><div>=C2=A0 - If it =
is streaming, which state backend are you using?</div><font color=3D"#88888=
8"><div><br></div><div>Stephan</div></font><div class=3D"elided-text"><div>=
<br></div><div class=3D"gmail_extra"><br><div class=3D"gmail_quote">On Tue,=
 Jun 6, 2017 at 12:00 PM, Fabian Hueske <span dir=3D"ltr">&lt;<a href=3D"ma=
ilto:fhueske@gmail.com" target=3D"_blank">fhueske@gmail.com</a>&gt;</span> =
wrote:<br><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;bord=
er-left:1px #ccc solid;padding-left:1ex"><div dir=3D"ltr"><div><div>Hi Flav=
io,<br><br></div>can you post the all memory configuration parameters of yo=
ur workers?<br></div><div>Did you investigate which whether the direct or h=
eap memory grew?<br><br></div><div>Thanks, Fabian<br></div><div><div class=
=3D"m_5664780494735749605h5"><div class=3D"gmail_extra"><br><div class=3D"g=
mail_quote">2017-05-29 20:53 GMT+02:00 Flavio Pompermaier <span dir=3D"ltr"=
>&lt;<a href=3D"mailto:pompermaier@okkam.it" target=3D"_blank">pompermaier@=
okkam.it</a>&gt;</span>:<br><blockquote class=3D"gmail_quote" style=3D"marg=
in:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir=3D"auto=
">Hi to all,<div dir=3D"auto">I&#39;m still trying to understand what&#39;s=
 going on our production Flink cluster.</div><div dir=3D"auto">The facts ar=
e:</div><div dir=3D"auto"><br></div><div dir=3D"auto">1. The Flink cluster =
runs on 5 VMWare VMs managed by ESXi</div><div dir=3D"auto">2. On a specifi=
c =C2=A0job we have, without limiting the direct memory to 5g, the TM gets =
killed by the OS almost immediately because the memory required by the TM, =
at some point, becomes huge, like &gt; 100 GB (others jobs seem to be less =
affected by the problem )</div><div dir=3D"auto">3. Although the memory con=
sumption is much better this way, the Flink TM memory continuously grow job=
 after job (of this problematic type): we set TM max heap to 14 GB and the =
JVM required memory can be ~ 30 Gb. How is that possible?</div><div dir=3D"=
auto"><br></div><div dir=3D"auto">My fear is that there&#39;s some annoying=
 memory leak / bad memory allocation in the Flink network level, but I can&=
#39;t have any evidence of this (except the fact that the vm which doesn=
9;t have a hdfs datanode underneath the Flink TM is the one with the bigges=
t TM virtual memory consumption).</div><div dir=3D"auto"><br></div><div dir=
=3D"auto">Thanks for the help ,</div><div dir=3D"auto">Flavio</div></div><d=
iv class=3D"m_5664780494735749605m_7301217726897044703m_2153672422915408832=
HOEnZb"><div class=3D"m_5664780494735749605m_7301217726897044703m_215367242=
2915408832h5"><div class=3D"gmail_extra"><br><div class=3D"gmail_quote">On =
29 May 2017 15:37, &quot;Nico Kruber&quot; &lt;<a href=3D"mailto:nico@data-=
artisans.com" target=3D"_blank">nico@data-artisans.com</a>&gt; wrote:<br ty=
pe=3D"attribution"><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 =
.8ex;border-left:1px #ccc solid;padding-left:1ex">FYI: taskmanager.sh sets =
this parameter but also states the following:<br>
<br>
=C2=A0 # Long.MAX_VALUE in TB: This is an upper bound, much less direct mem=
ory will<br>
be used<br>
=C2=A0 TM_MAX_OFFHEAP_SIZE=3D&quot;8388607T&quot;<br>
<br>
<br>
Nico<br>
<br>
On Monday, 29 May 2017 15:19:47 CEST Aljoscha Krettek wrote:<br>
&gt; Hi Flavio,<br>
&gt;<br>
&gt; Is this running on YARN or bare metal? Did you manage to find out wher=
e this<br>
&gt; insanely large parameter is coming from?<br>
&gt;<br>
&gt; Best,<br>
&gt; Aljoscha<br>
&gt;<br>
&gt; &gt; On 25. May 2017, at 19:36, Flavio Pompermaier &lt;<a href=3D"mail=
to:pompermaier@okkam.it" target=3D"_blank">pompermaier@okkam.it</a>&gt;<br>
&gt; &gt; wrote:<br>
&gt; &gt;<br>
&gt; &gt; Hi to all,<br>
&gt; &gt; I think we found the root cause of all the problems. Looking ad d=
mesg<br>
&gt; &gt; there was a &quot;crazy&quot; total-vm size associated to the OOM=
 error, a LOT much<br>
&gt; &gt; bigger than the TaskManager&#39;s available memory. In our case, =
the TM had a<br>
&gt; &gt; max heap of 14 GB while the dmsg error was reporting a required a=
mount of<br>
&gt; &gt; memory in the order of 60 GB!<br>
&gt; &gt;<br>
&gt; &gt; [ 5331.992539] Out of memory: Kill process 24221 (java) score 937=
 or<br>
&gt; &gt; sacrifice child [ 5331.992619] Killed process 24221 (java)<br>
&gt; &gt; total-vm:64800680kB, anon-rss:31387544kB, file-rss:6064kB, shmem-=
rss:0kB<br>
&gt; &gt;<br>
&gt; &gt; That wasn&#39;t definitively possible usin an ordinary JVM (and o=
ur TM was<br>
&gt; &gt; running without off-heap settings) so we&#39;ve looked at the par=
ameters used<br>
&gt; &gt; to run the TM JVM and indeed there was a reall huge amount of mem=
ory<br>
&gt; &gt; given to MaxDirectMemorySize. With my big surprise Flink runs a T=
M with<br>
&gt; &gt; this parameter set to 8.388.607T..does it make any sense?? Is it<=
br>
&gt; &gt; documented anywhere the importance of this parameter (and why it =
is used<br>
&gt; &gt; in non off-heap mode as well)? Is it related to network buffers? =
It<br>
&gt; &gt; should also be documented that this parameter should be added to =
the TM<br>
&gt; &gt; heap when reserving memory to Flin (IMHO).<br>
&gt; &gt;<br>
&gt; &gt; I hope that this painful sessions of Flink troubleshooting could =
be an<br>
&gt; &gt; added value sooner or later..<br>
&gt; &gt;<br>
&gt; &gt; Best,<br>
&gt; &gt; Flavio<br>
&gt; &gt;<br>
&gt; &gt; On Thu, May 25, 2017 at 10:21 AM, Flavio Pompermaier &lt;<a href=
=3D"mailto:pompermaier@okkam.it" target=3D"_blank">pompermaier@okkam.it</a>=
<br>
&gt; &gt; &lt;mailto:<a href=3D"mailto:pompermaier@okkam.it" target=3D"_bla=
nk">pompermaier@okkam.it</a>&gt;&gt; wrote: I can confirm that after giving=
<br>
&gt; &gt; less memory to the Flink TM the job was able to run successfully.=
 After<br>
&gt; &gt; almost 2 weeks of pain, we summarize here our experience with Fin=
k in<br>
&gt; &gt; virtualized environments (such as VMWare ESXi): Disable the<br>
&gt; &gt; virtualization &quot;feature&quot; that transfer a VM from a (hea=
vy loaded)<br>
&gt; &gt; physical machine to another one (to balance the resource consumpt=
ion)<br>
&gt; &gt; Check dmesg when a TM dies without logging anything (usually it g=
oes OOM<br>
&gt; &gt; and the OS kills it but there you can find the log of this thing)=
 CentOS<br>
&gt; &gt; 7 on ESXi seems to start swapping VERY early (in my case I see th=
e OS<br>
&gt; &gt; starting swapping also if there are 12 out of 32 GB of free memor=
y)!<br>
&gt; &gt; We&#39;re still investigating how this behavior could be fixed: t=
he problem<br>
&gt; &gt; is that it&#39;s better not to disable swapping because otherwise=
 VMWare<br>
&gt; &gt; could start ballooning (that is definitely worse...).<br>
&gt; &gt;<br>
&gt; &gt; I hope this tips could save someone else&#39;s day..<br>
&gt; &gt;<br>
&gt; &gt; Best,<br>
&gt; &gt; Flavio<br>
&gt; &gt;<br>
&gt; &gt; On Wed, May 24, 2017 at 4:28 PM, Flavio Pompermaier &lt;<a href=
=3D"mailto:pompermaier@okkam.it" target=3D"_blank">pompermaier@okkam.it</a>=
<br>
&gt; &gt; &lt;mailto:<a href=3D"mailto:pompermaier@okkam.it" target=3D"_bla=
nk">pompermaier@okkam.it</a>&gt;&gt; wrote: Hi Greg, you were right! After<=
br>
&gt; &gt; typing dmsg I found &quot;Out of memory: Kill process 13574 (java=
)&quot;. This is<br>
&gt; &gt; really strange because the JVM of the TM is very calm.<br>
&gt; &gt; Moreover, there are 7 GB of memory available (out of 32) but some=
how the<br>
&gt; &gt; OS decides to start swapping and, when it runs out of available s=
wap<br>
&gt; &gt; memory, the OS decides to kill the Flink TM :(<br>
&gt; &gt;<br>
&gt; &gt; Any idea of what&#39;s going on here?<br>
&gt; &gt;<br>
&gt; &gt; On Wed, May 24, 2017 at 2:32 PM, Flavio Pompermaier &lt;<a href=
=3D"mailto:pompermaier@okkam.it" target=3D"_blank">pompermaier@okkam.it</a>=
<br>
&gt; &gt; &lt;mailto:<a href=3D"mailto:pompermaier@okkam.it" target=3D"_bla=
nk">pompermaier@okkam.it</a>&gt;&gt; wrote: Hi Greg,<br>
&gt; &gt; I carefully monitored all TM memory with jstat -gcutil and there&=
#39;no full<br>
&gt; &gt; gc, only .&gt;<br>
&gt; &gt; The initial situation on the dying TM is:<br>
&gt; &gt;=C2=A0 =C2=A0S0=C2=A0 =C2=A0 =C2=A0S1=C2=A0 =C2=A0 =C2=A0E=C2=A0 =
=C2=A0 =C2=A0 O=C2=A0 =C2=A0 =C2=A0 M=C2=A0 =C2=A0 =C2=A0CCS=C2=A0 =C2=A0 Y=
GC=C2=A0 =C2=A0 =C2=A0YGCT=C2=A0 =C2=A0 FGC=C2=A0 =C2=A0 FGCT<br>
&gt; &gt;=C2=A0 =C2=A0GCT 0.00 100.00=C2=A0 33.57=C2=A0 88.74=C2=A0 98.42=
=C2=A0 97.17=C2=A0 =C2=A0 159=C2=A0 =C2=A0 2.508=C2=A0 =C2=A0 =C2=A01<br>
&gt; &gt;=C2=A0 =C2=A00.255=C2=A0 =C2=A0 2.763 0.00 100.00=C2=A0 90.14=C2=
=A0 88.80=C2=A0 98.67=C2=A0 97.17=C2=A0 =C2=A0 197=C2=A0 =C2=A0 2.617<br>
&gt; &gt;=C2=A0 =C2=A0 =C2=A0 1=C2=A0 =C2=A0 0.255=C2=A0 =C2=A0 2.873 0.00 =
100.00=C2=A0 27.00=C2=A0 88.82=C2=A0 98.75=C2=A0 97.17=C2=A0 =C2=A0 234<br>
&gt; &gt;=C2=A0 =C2=A0 2.730=C2=A0 =C2=A0 =C2=A01=C2=A0 =C2=A0 0.255=C2=A0 =
=C2=A0 2.986&gt;<br>
&gt; &gt; After about 10 hours of processing is:<br>
&gt; &gt;=C2=A0 =C2=A00.00 100.00=C2=A0 21.74=C2=A0 83.66=C2=A0 98.52=C2=A0=
 96.94=C2=A0 =C2=A05519=C2=A0 =C2=A033.011=C2=A0 =C2=A0 =C2=A01=C2=A0 =C2=
=A0 0.255<br>
&gt; &gt;=C2=A0 =C2=A033.267 0.00 100.00=C2=A0 21.74=C2=A0 83.66=C2=A0 98.5=
2=C2=A0 96.94=C2=A0 =C2=A05519=C2=A0 =C2=A033.011=C2=A0 =C2=A0 =C2=A01<br>
&gt; &gt;=C2=A0 =C2=A00.255=C2=A0 =C2=A033.267 0.00 100.00=C2=A0 21.74=C2=
=A0 83.66=C2=A0 98.52=C2=A0 96.94=C2=A0 =C2=A05519=C2=A0 =C2=A033.011<br>
&gt; &gt;=C2=A0 =C2=A0 =C2=A0 1=C2=A0 =C2=A0 0.255=C2=A0 =C2=A033.267&gt;<b=
r>
&gt; &gt; So I don&#39;t think thta OOM could be an option.<br>
&gt; &gt;<br>
&gt; &gt; However, the cluster is running on ESXi vSphere VMs and we alread=
y<br>
&gt; &gt; experienced unexpected crash of jobs because of ESXi moving a<br>
&gt; &gt; heavy-loaded VM to another (less loaded) physical machine..I woul=
d&#39;t be<br>
&gt; &gt; surprised if swapping is also handled somehow differently.. Looki=
ng at<br>
&gt; &gt; Cloudera widgets I see that the crash is usually preceded by an i=
ntense<br>
&gt; &gt; cpu_iowait period. I fear that Flink unsafe access to memory coul=
d be a<br>
&gt; &gt; problem in those scenarios. Am I wrong?<br>
&gt; &gt;<br>
&gt; &gt; Any insight or debugging technique is=C2=A0 greatly appreciated.<=
br>
&gt; &gt; Best,<br>
&gt; &gt; Flavio<br>
&gt; &gt;<br>
&gt; &gt;<br>
&gt; &gt; On Wed, May 24, 2017 at 2:11 PM, Greg Hogan &lt;<a href=3D"mailto=
:code@greghogan.com" target=3D"_blank">code@greghogan.com</a><br>
&gt; &gt; &lt;mailto:<a href=3D"mailto:code@greghogan.com" target=3D"_blank=
">code@greghogan.com</a>&gt;&gt; wrote: Hi Flavio,<br>
&gt; &gt;<br>
&gt; &gt; Flink handles interrupts so the only silent killer I am aware of =
is<br>
&gt; &gt; Linux&#39;s OOM killer. Are you seeing such a message in dmesg?<b=
r>
&gt; &gt;<br>
&gt; &gt; Greg<br>
&gt; &gt;<br>
&gt; &gt; On Wed, May 24, 2017 at 3:18 AM, Flavio Pompermaier &lt;<a href=
=3D"mailto:pompermaier@okkam.it" target=3D"_blank">pompermaier@okkam.it</a>=
<br>
&gt; &gt; &lt;mailto:<a href=3D"mailto:pompermaier@okkam.it" target=3D"_bla=
nk">pompermaier@okkam.it</a>&gt;&gt; wrote: Hi to all,<br>
&gt; &gt; I&#39;d like to know whether memory swapping could cause a taskma=
nager crash.<br>
&gt; &gt; In my cluster of virtual machines &#39;m seeing this strange beha=
vior in my<br>
&gt; &gt; Flink cluster: sometimes, if memory get swapped the taskmanager (=
on that<br>
&gt; &gt; machine) dies unexpectedly without any log about the error.<br>
&gt; &gt;<br>
&gt; &gt; Is that possible or not?<br>
&gt; &gt;<br>
&gt; &gt; Best,<br>
&gt; &gt; Flavio<br>
<br>
</blockquote></div></div>
</div></div></blockquote></div><br></div></div></div></div>
</blockquote></div><br></div></div></div>
</blockquote></div><br></div></div></div>

--94eb2c083e22c3321405514e23ee--