flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aljoscha Krettek <aljos...@apache.org>
Subject Re: Flink and swapping question
Date Mon, 29 May 2017 13:19:47 GMT
Hi Flavio,

Is this running on YARN or bare metal? Did you manage to find out where this insanely large
parameter is coming from?

Best,
Aljoscha

> On 25. May 2017, at 19:36, Flavio Pompermaier <pompermaier@okkam.it> wrote:
> 
> Hi to all,
> I think we found the root cause of all the problems. Looking ad dmesg there was a "crazy"
total-vm size associated to the OOM error, a LOT much bigger than the TaskManager's available
memory.
> In our case, the TM had a max heap of 14 GB while the dmsg error was reporting a required
amount of memory in the order of 60 GB!
> 
> [ 5331.992539] Out of memory: Kill process 24221 (java) score 937 or sacrifice child
> [ 5331.992619] Killed process 24221 (java) total-vm:64800680kB, anon-rss:31387544kB,
file-rss:6064kB, shmem-rss:0kB
> 
> That wasn't definitively possible usin an ordinary JVM (and our TM was running without
off-heap settings) so we've looked at the parameters used to run the TM JVM and indeed there
was a reall huge amount of memory given to MaxDirectMemorySize. With my big surprise Flink
runs a TM with this parameter set to 8.388.607T..does it make any sense??
> Is it documented anywhere the importance of this parameter (and why it is used in non
off-heap mode as well)? Is it related to network buffers?
> It should also be documented that this parameter should be added to the TM heap when
reserving memory to Flin (IMHO).
> 
> I hope that this painful sessions of Flink troubleshooting could be an added value sooner
or later..
> 
> Best,
> Flavio
> 
> On Thu, May 25, 2017 at 10:21 AM, Flavio Pompermaier <pompermaier@okkam.it <mailto:pompermaier@okkam.it>>
wrote:
> I can confirm that after giving less memory to the Flink TM the job was able to run successfully.
> After almost 2 weeks of pain, we summarize here our experience with Fink in virtualized
environments (such as VMWare ESXi):
> Disable the virtualization "feature" that transfer a VM from a (heavy loaded) physical
machine to another one (to balance the resource consumption)
> Check dmesg when a TM dies without logging anything (usually it goes OOM and the OS kills
it but there you can find the log of this thing)
> CentOS 7 on ESXi seems to start swapping VERY early (in my case I see the OS starting
swapping also if there are 12 out of 32 GB of free memory)!
> We're still investigating how this behavior could be fixed: the problem is that it's
better not to disable swapping because otherwise VMWare could start ballooning (that is definitely
worse...).
> 
> I hope this tips could save someone else's day..
> 
> Best,
> Flavio
> 
> On Wed, May 24, 2017 at 4:28 PM, Flavio Pompermaier <pompermaier@okkam.it <mailto:pompermaier@okkam.it>>
wrote:
> Hi Greg, you were right! After typing dmsg I found "Out of memory: Kill process 13574
(java)".
> This is really strange because the JVM of the TM is very calm.
> Moreover, there are 7 GB of memory available (out of 32) but somehow the OS decides to
start swapping and, when it runs out of available swap memory, the OS decides to kill the
Flink TM :(
> 
> Any idea of what's going on here?
> 
> On Wed, May 24, 2017 at 2:32 PM, Flavio Pompermaier <pompermaier@okkam.it <mailto:pompermaier@okkam.it>>
wrote:
> Hi Greg,
> I carefully monitored all TM memory with jstat -gcutil and there'no full gc, only .
> The initial situation on the dying TM is:
> 
>   S0     S1     E      O      M     CCS    YGC     YGCT    FGC    FGCT     GCT   
>   0.00 100.00  33.57  88.74  98.42  97.17    159    2.508     1    0.255    2.763
>   0.00 100.00  90.14  88.80  98.67  97.17    197    2.617     1    0.255    2.873
>   0.00 100.00  27.00  88.82  98.75  97.17    234    2.730     1    0.255    2.986
> 
> After about 10 hours of processing is:
> 
>   0.00 100.00  21.74  83.66  98.52  96.94   5519   33.011     1    0.255   33.267
>   0.00 100.00  21.74  83.66  98.52  96.94   5519   33.011     1    0.255   33.267
>   0.00 100.00  21.74  83.66  98.52  96.94   5519   33.011     1    0.255   33.267
> 
> So I don't think thta OOM could be an option.
> 
> However, the cluster is running on ESXi vSphere VMs and we already experienced unexpected
crash of jobs because of ESXi moving a heavy-loaded VM to another (less loaded) physical machine..I
would't be surprised if swapping is also handled somehow differently..
> Looking at Cloudera widgets I see that the crash is usually preceded by an intense cpu_iowait
period.
> I fear that Flink unsafe access to memory could be a problem in those scenarios. Am I
wrong?
> 
> Any insight or debugging technique is  greatly appreciated.
> Best,
> Flavio
> 
> 
> On Wed, May 24, 2017 at 2:11 PM, Greg Hogan <code@greghogan.com <mailto:code@greghogan.com>>
wrote:
> Hi Flavio,
> 
> Flink handles interrupts so the only silent killer I am aware of is Linux's OOM killer.
Are you seeing such a message in dmesg?
> 
> Greg
> 
> On Wed, May 24, 2017 at 3:18 AM, Flavio Pompermaier <pompermaier@okkam.it <mailto:pompermaier@okkam.it>>
wrote:
> Hi to all,
> I'd like to know whether memory swapping could cause a taskmanager crash. 
> In my cluster of virtual machines 'm seeing this strange behavior in my Flink cluster:
sometimes, if memory get swapped the taskmanager (on that machine) dies unexpectedly without
any log about the error.
> 
> Is that possible or not?
> 
> Best,
> Flavio
> 
> 
> 


Mime
View raw message