flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aljoscha Krettek (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (FLINK-8622) flink-mesos: High memory usage of scheduler + job manager. GC never kicks in.
Date Fri, 09 Feb 2018 11:17:00 GMT

     [ https://issues.apache.org/jira/browse/FLINK-8622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Aljoscha Krettek updated FLINK-8622:
    Priority: Blocker  (was: Major)

> flink-mesos: High memory usage of scheduler + job manager. GC never kicks in.
> -----------------------------------------------------------------------------
>                 Key: FLINK-8622
>                 URL: https://issues.apache.org/jira/browse/FLINK-8622
>             Project: Flink
>          Issue Type: Bug
>          Components: Distributed Coordination, Mesos, ResourceManager
>    Affects Versions: 1.4.0, 1.3.2
>            Reporter: Bhumika Bayani
>            Priority: Blocker
>             Fix For: 1.5.0
> We are deploying a 1 job manager + 6 taskmanager flink cluster on mesos.
> We have observed that the memory usage for 'jobmanager' is high. In spite of allocating
more and more memory resources to it, it hits the limit within minutes.
> We had started with 1.5 GB RAM and 1 GB heap. Currently we have allocated 4 GB RAM, 3
GB heap to jobmanager cum scheduler. We tried allocating 8GB RAM and lesser heap (i.e. same,
3GB) too. In that case also, memory graph was identical.
> As per the graph below, the scheduler almost always runs with maximum memory resources.
> !flink-mem-usage-graph-for-jira.png!
> Throughout the run of the scheduler, we do not see memory usage going down unless it
is killed due to OOM. So inferring, garbage collection is never happening.
> We have tried using both flink versions 1.4 and 1.3 but could see same issue on both
> Is there any way we can find out where and how memory is being used? 
> Are there any flink config options for jobmanager or jvm parameters which can help us restrict
the memory usage, force garbage collection, and prevent it from crash? 
> Please let us know if there any resource recommendations from Flink for running Flink
on mesos at scale.

This message was sent by Atlassian JIRA

View raw message