flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aljoscha Krettek <aljos...@apache.org>
Subject Re: Flink memory leak
Date Tue, 07 Nov 2017 14:08:39 GMT
I agree with Ufuk, it would be helpful to know what stateful operations are in the jobs (including
windowing).

> On 7. Nov 2017, at 14:53, Ufuk Celebi <uce@apache.org> wrote:
> 
> Do you use any windowing? If yes, could you please share that code? If
> there is no stateful operation at all, it's strange where the list
> state instances are coming from.
> 
> On Tue, Nov 7, 2017 at 2:35 PM, ebru <b20926247@cs.hacettepe.edu.tr> wrote:
>> Hi Ufuk,
>> 
>> We don’t explicitly define any state descriptor. We only use map and filters
>> operator. We thought that gc handle clearing the flink’s internal states.
>> So how can we manage the memory if it is always increasing?
>> 
>> - Ebru
>> 
>> On 7 Nov 2017, at 16:23, Ufuk Celebi <uce@apache.org> wrote:
>> 
>> Hey Ebru, the memory usage might be increasing as long as a job is running.
>> This is expected (also in the case of multiple running jobs). The
>> screenshots are not helpful in that regard. :-(
>> 
>> What kind of stateful operations are you using? Depending on your use case,
>> you have to manually call `clear()` on the state instance in order to
>> release the managed state.
>> 
>> Best,
>> 
>> Ufuk
>> 
>> On Tue, Nov 7, 2017 at 12:43 PM, ebru <b20926247@cs.hacettepe.edu.tr> wrote:
>>> 
>>> 
>>> 
>>> Begin forwarded message:
>>> 
>>> From: ebru <b20926247@cs.hacettepe.edu.tr>
>>> Subject: Re: Flink memory leak
>>> Date: 7 November 2017 at 14:09:17 GMT+3
>>> To: Ufuk Celebi <uce@apache.org>
>>> 
>>> Hi Ufuk,
>>> 
>>> There are there snapshots of htop output.
>>> 1. snapshot is initial state.
>>> 2. snapshot is after submitted one job.
>>> 3. Snapshot is the output of the one job with 15000 EPS. And the memory
>>> usage is always increasing over time.
>>> 
>>> 
>>> 
>>> 
>>> <1.png><2.png><3.png>
>>> 
>>> On 7 Nov 2017, at 13:34, Ufuk Celebi <uce@apache.org> wrote:
>>> 
>>> Hey Ebru,
>>> 
>>> let me pull in Aljoscha (CC'd) who might have an idea what's causing this.
>>> 
>>> Since multiple jobs are running, it will be hard to understand to
>>> which job the state descriptors from the heap snapshot belong to.
>>> - Is it possible to isolate the problem and reproduce the behaviour
>>> with only a single job?
>>> 
>>> – Ufuk
>>> 
>>> 
>>> On Tue, Nov 7, 2017 at 10:27 AM, ÇETİNKAYA EBRU ÇETİNKAYA EBRU
>>> <b20926247@cs.hacettepe.edu.tr> wrote:
>>> 
>>> Hi,
>>> 
>>> We are using Flink 1.3.1 in production, we have one job manager and 3 task
>>> managers in standalone mode. Recently, we've noticed that we have memory
>>> related problems. We use docker container to serve Flink cluster. We have
>>> 300 slots and 20 jobs are running with parallelism of 10. Also the job
>>> count
>>> may be change over time. Taskmanager memory usage always increases. After
>>> job cancelation this memory usage doesn't decrease. We've tried to
>>> investigate the problem and we've got the task manager jvm heap snapshot.
>>> According to the jam heap analysis, possible memory leak was Flink list
>>> state descriptor. But we are not sure that is the cause of our memory
>>> problem. How can we solve the problem?
>>> 
>>> 
>>> 
>> 
>> 


Mime
View raw message