flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From 김동일 <kim.s...@gmail.com>
Subject Re: taskmanager memory leak
Date Thu, 21 Jul 2016 10:52:57 GMT
Dear Stephan.

I also suspect the s3. 
I’ve tried s3n, s3a.
what is suitable library? I’m using aws-java-sdk-1.7.4 and hadoop-aws-2.7.2.

Best regards.

> On Jul 21, 2016, at 5:54 PM, Stephan Ewen <sewen@apache.org> wrote:
> 
> Hi!
> 
> There is a memory debugging logger, you can activate it like that:
> https://ci.apache.org/projects/flink/flink-docs-master/setup/config.html#memory-and-performance-debugging
<https://ci.apache.org/projects/flink/flink-docs-master/setup/config.html#memory-and-performance-debugging>
> 
> It will print which parts of the memory are growing.
> 
> What you can also try is to deactivate checkpointing for one run and see if that solves
it. If yes, then I suspect there is a memory leak in the s3 library (are you using s3, s3a,
or s3n?).
> 
> Can you also check what libraries you are using? We have seen cases of memory leaks in
the libraries people used.
> 
> Greetings,
> Stephan
> 
> 
> 
> On Thu, Jul 21, 2016 at 5:13 AM, 김동일 <kim.same@gmail.com <mailto:kim.same@gmail.com>>
wrote:
> hi. stephan. 
> 
> - Did you submit any job to the cluster, or is the memory just growing even on an idle
TaskManager?
> 
> I have some stream job. 
> 
> - If you are running a job, do you use the RocksDB state backend, of the FileSystem state
backend?
> 
> file state backend. i use s3.
> 
> - Does it grow infinitely, or simply up a certain point and then goes down again?
> 
> I think it infinitely. kernel kills the process , oom.
> 
> 
> 
> On Thu, Jul 21, 2016 at 3:52 AM Stephan Ewen <sewen@apache.org <mailto:sewen@apache.org>>
wrote:
> Hi!
> 
> In order to answer this, we need a bit more information. Here are some followup questions:
> 
>   - Did you submit any job to the cluster, or is the memory just growing even on an idle
TaskManager?
>   - If you are running a job, do you use the RocksDB state backend, of the FileSystem
state backend?
>   - Does it grow infinitely, or simply up a certain point and then goes down again?
> 
> Greetings,
> Stephan
> 
> 
> On Wed, Jul 20, 2016 at 5:58 PM, 김동일 <kim.same@gmail.com <mailto:kim.same@gmail.com>>
wrote:
> oh. my flink version is 1.0.3.
> 
> 
> ---------- Forwarded message ----------
> From: 김동일 <kim.same@gmail.com <mailto:kim.same@gmail.com>>
> Date: Thu, Jul 21, 2016 at 12:52 AM
> Subject: taskmanager memory leak
> To: user@flink.apache.org <mailto:user@flink.apache.org>
> 
> 
> I've set up cluster(stand alone).
> Taskmanager consumes memory over the Xmx property and it grows up continuously.
> I saw this link(http://mail-archives.apache.org/mod_mbox/flink-dev/201606.mbox/%3CCAK2vtervsw4muBOc4SWix0mR6Y9biJznjuYpF6_f9f0g9-_6LA@mail.gmail.com%3E
<http://mail-archives.apache.org/mod_mbox/flink-dev/201606.mbox/%3CCAK2vtervsw4muBOc4SWix0mR6Y9biJznjuYpF6_f9f0g9-_6LA@mail.gmail.com%3E>).
> So i set the taskmanager.memory.preallocation value to true but that is not solution.
> 
> my java version is
> java version "1.8.0_20"
> Java(TM) SE Runtime Environment (build 1.8.0_20-b26)
> Java HotSpot(TM) 64-Bit Server VM (build 25.20-b23, mixed mode)
> 
> and my flink-conf.yaml <http://mail-archives.apache.org/mod_mbox/flink-dev/201606.mbox/%3CCAK2vtervsw4muBOc4SWix0mR6Y9biJznjuYpF6_f9f0g9-_6LA@mail.gmail.com%3E>
<http://mail-archives.apache.org/mod_mbox/flink-dev/201606.mbox/%3CCAK2vtervsw4muBOc4SWix0mR6Y9biJznjuYpF6_f9f0g9-_6LA@mail.gmail.com%3E>
<http://mail-archives.apache.org/mod_mbox/flink-dev/201606.mbox/%3CCAK2vtervsw4muBOc4SWix0mR6Y9biJznjuYpF6_f9f0g9-_6LA@mail.gmail.com%3E>
<http://mail-archives.apache.org/mod_mbox/flink-dev/201606.mbox/%3CCAK2vtervsw4muBOc4SWix0mR6Y9biJznjuYpF6_f9f0g9-_6LA@mail.gmail.com%3E>
<http://mail-archives.apache.org/mod_mbox/flink-dev/201606.mbox/%3CCAK2vtervsw4muBOc4SWix0mR6Y9biJznjuYpF6_f9f0g9-_6LA@mail.gmail.com%3E>
<http://mail-archives.apache.org/mod_mbox/flink-dev/201606.mbox/%3CCAK2vtervsw4muBOc4SWix0mR6Y9biJznjuYpF6_f9f0g9-_6LA@mail.gmail.com%3E>
<http://mail-archives.apache.org/mod_mbox/flink-dev/201606.mbox/%3CCAK2vtervsw4muBOc4SWix0mR6Y9biJznjuYpF6_f9f0g9-_6LA@mail.gmail.com%3E>
<http://mail-archives.apache.org/mod_mbox/flink-dev/201606.mbox/%3CCAK2vtervsw4muBOc4SWix0mR6Y9biJznjuYpF6_f9f0g9-_6LA@mail.gmail.com%3E>
<http://mail-archives.apache.org/mod_mbox/flink-dev/201606.mbox/%3CCAK2vtervsw4muBOc4SWix0mR6Y9biJznjuYpF6_f9f0g9-_6LA@mail.gmail.com%3E>
<http://mail-archives.apache.org/mod_mbox/flink-dev/201606.mbox/%3CCAK2vtervsw4muBOc4SWix0mR6Y9biJznjuYpF6_f9f0g9-_6LA@mail.gmail.com%3E>
<http://mail-archives.apache.org/mod_mbox/flink-dev/201606.mbox/%3CCAK2vtervsw4muBOc4SWix0mR6Y9biJznjuYpF6_f9f0g9-_6LA@mail.gmail.com%3E>
<http://mail-archives.apache.org/mod_mbox/flink-dev/201606.mbox/%3CCAK2vtervsw4muBOc4SWix0mR6Y9biJznjuYpF6_f9f0g9-_6LA@mail.gmail.com%3E>
<http://mail-archives.apache.org/mod_mbox/flink-dev/201606.mbox/%3CCAK2vtervsw4muBOc4SWix0mR6Y9biJznjuYpF6_f9f0g9-_6LA@mail.gmail.com%3E>
<http://mail-archives.apache.org/mod_mbox/flink-dev/201606.mbox/%3CCAK2vtervsw4muBOc4SWix0mR6Y9biJznjuYpF6_f9f0g9-_6LA@mail.gmail.com%3E>
<http://mail-archives.apache.org/mod_mbox/flink-dev/201606.mbox/%3CCAK2vtervsw4muBOc4SWix0mR6Y9biJznjuYpF6_f9f0g9-_6LA@mail.gmail.com%3E>
<http://mail-archives.apache.org/mod_mbox/flink-dev/201606.mbox/%3CCAK2vtervsw4muBOc4SWix0mR6Y9biJznjuYpF6_f9f0g9-_6LA@mail.gmail.com%3E>
<http://mail-archives.apache.org/mod_mbox/flink-dev/201606.mbox/%3CCAK2vtervsw4muBOc4SWix0mR6Y9biJznjuYpF6_f9f0g9-_6LA@mail.gmail.com%3E>
> 
> env.java.home: /usr/java/default
> jobmanager.rpc.address: internal.stream01.denma.ggportal.net <http://internal.stream01.denma.ggportal.net/>
> jobmanager.rpc.port: 6123
> jobmanager.heap.mb: 2048
> taskmanager.heap.mb: 8192
> taskmanager.memory.off-heap: true
> taskmanager.numberOfTaskSlots: 4
> taskmanager.memory.preallocate: false
> parallelism.default: 2
> jobmanager.web.port: 8081
> jobmanager.web.submit.enable: true
> state.backend: filesystem
> state.backend.fs.checkpointdir: s3a://denma.live/flink/datum/checkpoints
> taskmanager.network.numberOfBuffers: 8192
> taskmanager.tmp.dirs: /opt/flink/var/tmp
> fs.hdfs.hadoopconf: /opt/flink/conf/
> recovery.mode: zookeeper
> recovery.zookeeper.quorum: ....
> recovery.zookeeper.storageDir: s3a://denma.live/flink/datum/recovery
> recovery.jobmanager.port: 50000-50100
> recovery.zookeeper.path.root: /flink
> blob.server.port: 50100-50200
> blob.storage.directory: /opt/flink/var/tmp/flink-blob
> taskmanager.rpc.port: 6122
> taskmanager.data.port: 6121
> 
> i need help. what shall i do?
> thx in advance.
> 
> 
> 
> -- 
> <A HREF="http://www.kiva.org <http://www.kiva.org/>" TARGET="_top">
> <IMG SRC="http://www.kiva.org/images/bannerlong.png <http://www.kiva.org/images/bannerlong.png>"
WIDTH="460" HEIGHT="60" ALT="Kiva - loans that change lives" BORDER="0" ALIGN="BOTTOM"></A>
> 
> 


Mime
View raw message