From user-return-23736-archive-asf-public=cust-asf.ponee.io@flink.apache.org Wed Oct 17 08:35:42 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id B568918061A for ; Wed, 17 Oct 2018 08:35:41 +0200 (CEST) Received: (qmail 60403 invoked by uid 500); 17 Oct 2018 06:35:40 -0000 Mailing-List: contact user-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list user@flink.apache.org Received: (qmail 60392 invoked by uid 99); 17 Oct 2018 06:35:40 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 17 Oct 2018 06:35:40 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id BC4A61822EC for ; Wed, 17 Oct 2018 06:35:39 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 3.14 X-Spam-Level: *** X-Spam-Status: No, score=3.14 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, FREEMAIL_REPLY=1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001, T_DKIMWL_WL_MED=-0.01, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id eBlzwU6nF5xH for ; Wed, 17 Oct 2018 06:35:38 +0000 (UTC) Received: from mail-pg1-f177.google.com (mail-pg1-f177.google.com [209.85.215.177]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id CB3075F3B4 for ; Wed, 17 Oct 2018 06:35:37 +0000 (UTC) Received: by mail-pg1-f177.google.com with SMTP id r9-v6so12030462pgv.6 for ; Tue, 16 Oct 2018 23:35:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:message-id:mime-version:subject:date:in-reply-to:cc:to :references; bh=reAGp0PNpHueHo894Wdx5fn4W5et7NxbQuizqHfTdnI=; b=sMzpIHjOgAEy911+SY2M2bh9W1Yj3d9vrBsBmr+DC7slfSYO/vTYuDRoOeqbDQVlAs 1MgApkY3oSzNz9xvwHViipMRJgpbME/5nHWc4CH0bB6vyyRETd/E9xHSeLuFmkHhq9kC pBdwrpEzByrPrQMCh1b3QenfCzpVZRu+hDSbVKj+vgQWd9PAVuG45BqmHvXTdJ2jNiQ2 TRErdB8kw+BUFCtZPNwYwx2xxgt1+txAprZfV0lm1aeE7zrXPbq0QWMAP+JVdSf2QzOd bemMuqt6Nu+/vygW3MJctEibL8vn2VWivaEXmmyCCrw28/syICjSgjws/AqRRcr6goA9 OY/w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:message-id:mime-version:subject:date :in-reply-to:cc:to:references; bh=reAGp0PNpHueHo894Wdx5fn4W5et7NxbQuizqHfTdnI=; b=bZQByAZvw3Bq9IltgU8pEgxYP8+X+O0mO+5SFuG2KjAHtimhBsL4NWtQNrak/OBPng B/7yBY8JAUbRPDd6Z6x2OHe61mwaxwhYbtKrrtybguXQ1vmyZh5B6o/RnWMejer87542 a+U33Cg+ww9UXyfpSdvRIw78v/yNU0d1SACm+jxYlLYf4ckLannzbMXhX65GPkgU0GvF X7i6VI0p2G05weEaYbUC8L20XGJ1aZOwVxX2lsoqLpmC9OWMMcckEQIV0V7V73DmFH+U BeM5edRmTk3owqrntz8E+PNCemz+lQbI5BIc1Pgj5NiHYVaprdouoH6h8VRVFvvgV4AV /ByQ== X-Gm-Message-State: ABuFfogeZAw95JDw+loJ+PyI9aoIGbjWYQg/ssC6hRtdNYrqMbdvCXY0 ZH638VJnaO39zvWvM/Ksrg== X-Google-Smtp-Source: ACcGV6345YGku4hfUFf7k8yMobYYGdxLAUV3gG7i31ExDICm38VmeD1IELa3346eHVOJ+LhHWH92Xw== X-Received: by 2002:a65:6204:: with SMTP id d4-v6mr23099696pgv.121.1539758136903; Tue, 16 Oct 2018 23:35:36 -0700 (PDT) Received: from [10.249.35.160] ([103.65.40.210]) by smtp.gmail.com with ESMTPSA id m27-v6sm28604381pff.187.2018.10.16.23.35.34 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 16 Oct 2018 23:35:35 -0700 (PDT) From: Paul Lam Message-Id: <6333D431-2AC3-474C-97AB-B6FEC77F0194@gmail.com> Content-Type: multipart/alternative; boundary="Apple-Mail=_2B6CF955-620E-4C61-8471-E0F90C67E889" Mime-Version: 1.0 (Mac OS X Mail 11.2 \(3445.5.20\)) Subject: Re: Need help to understand memory consumption Date: Wed, 17 Oct 2018 14:35:32 +0800 In-Reply-To: <6b136eb0-008b-4f7c-b371-7b00ec231dd6.wangzhijiang999@aliyun.com> Cc: jpreisner , user To: "Zhijiang(wangzhijiang999)" References: <1812208651.374697313.1539325791693.JavaMail.root@zimbra50-e8.priv.proxad.net> <308A2BB1-76D9-4787-BA95-CF4743E8817E@gmail.com> <6b136eb0-008b-4f7c-b371-7b00ec231dd6.wangzhijiang999@aliyun.com> X-Mailer: Apple Mail (2.3445.5.20) --Apple-Mail=_2B6CF955-620E-4C61-8471-E0F90C67E889 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=gb2312 Hi Zhijiang, Does the memory management apply to streaming jobs as well? A previous = post[1] said that it can only be used in batch API, but I might miss = some updates on that. Thank you! [1] = https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=3D5374152= 5 Best, Paul Lam > =D4=DA 2018=C4=EA10=D4=C217=C8=D5=A3=AC13:39=A3=ACZhijiang(wangzhijiang9= 99) =D0=B4=B5=C0=A3=BA >=20 > Hi Julien, >=20 > Flink would manage the default 70% fraction of free memory in = TaskManager for caching data efficiently, just as you mentioned in this = article = "https://flink.apache.org/news/2015/05/11/Juggling-with-Bits-and-Bytes.htm= l". These managed memories are persistent resident and referenced by the = MemoryManager once allocated, so they will be resident in old region of = JVM and will not be recycled by gc. To do so, wecan aovid the costs of = creating and recycling the objects repeatedly. >=20 > The default parameter "taskmanager.memory.preallocate" is false, that = means these managed memories will not be allocated during starting = TaskManager. When the job is running, the related tasks would request = these managed memories and then you will see the memory consumption is = high. When the job is cancelled, these managed memories will be released = to the MemoryManager but not recycled by gc, so you will see no changes = in memory consumption. After you restart the TaskManager, the initial = memory consumption is low because of lazy allocating via = taskmanager.memory.preallocate=3Dfalse. >=20 > Best, > Zhijiang > ------------------------------------------------------------------ > =B7=A2=BC=FE=C8=CB=A3=BAPaul Lam > =B7=A2=CB=CD=CA=B1=BC=E4=A3=BA2018=C4=EA10=D4=C217=C8=D5(=D0=C7=C6=DA=C8= =FD) 12:31 > =CA=D5=BC=FE=C8=CB=A3=BAjpreisner > =B3=AD=A1=A1=CB=CD=A3=BAuser > =D6=F7=A1=A1=CC=E2=A3=BARe: Need help to understand memory consumption >=20 >=20 > Hi Julien, >=20 > AFAIK, streaming jobs put data objects on heap, so the it depends on = the JVM GC to release the memory.=20 >=20 > Best, > Paul Lam >=20 > > =D4=DA 2018=C4=EA10=D4=C212=C8=D5=A3=AC14:29=A3=ACjpreisner@free.fr = =D0=B4=B5=C0=A3=BA > >=20 > > Hi, > >=20 > > My use case is :=20 > > - I use Flink 1.4.1 in standalone cluster with 5 VM (1 VM =3D 1 = JobManager + 1 TaskManager) > > - I run N jobs per days. N may vary (one day : N=3D20, another day : = N=3D50, ...). All jobs are the same. They connect to Kafka topics and = have two DB2 connector. > > - Depending on a special event, a job can self-restart via the = command : bin/flink cancel > > - At the end of the day, I cancel all jobs > > - Each VM is configured with 16Gb RAM > > - Allocated memory configured for one taskmanager is 10Gb > >=20 > > After several days, the memory saturates (we exceed 14Gb of used = memory). > >=20 > > I read the following posts but I did not succeed in understanding my = problem : > > - = https://flink.apache.org/news/2015/05/11/Juggling-with-Bits-and-Bytes.html= > > - = http://mail-archives.apache.org/mod_mbox/flink-user/201711.mbox/browser > >=20 > > I did some tests on a machine (outside the cluster) with the top = command and this is what I concluded (please see attached file - = Flink_memory.PNG) : > > - When a job is started and running, it consumes memory > > - When a job is cancelled, a large part of the memory is still used > > - When another job is started and running (after to have cancel the = previous job), even more memory is consumed > > - When I restart jobmanager and taskmanager, memory returns to = normal > >=20 > > Why when a job is canceled, the memory is not released? > >=20 > > I added another attachment that represents the graph of a job - = Graph.PNG. > > If it can be useful we use MapFunction, FlatMapFunction, = FilterFunction, triggers and windows, ... > >=20 > > Thanks in advance, > > Julien >=20 --Apple-Mail=_2B6CF955-620E-4C61-8471-E0F90C67E889 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=gb2312 Hi = Zhijiang,

Does the = memory management apply to streaming jobs as well? A previous post[1] = said that it can only be used in batch API, but I might miss some = updates on that. Thank you!

[1] https://cwiki.apache.org/confluence/pages/viewpage.action?pageI= d=3D53741525

Best,
Paul Lam

=D4=DA = 2018=C4=EA10=D4=C217=C8=D5=A3=AC13:39=A3=ACZhijiang(wangzhijiang999) = <wangzhijiang999@aliyun.com> =D0=B4=B5=C0=A3=BA

Hi Julien,

Flink would = manage the default 70% fraction of free memory in TaskManager for = caching data efficiently, just as you mentioned in this article "https://flink.apache.org/news/2015/05/11/Juggling-with-Bits-and= -Bytes.html". These managed memories are persistent resident and = referenced by the MemoryManager once allocated, so they will be resident = in old region of JVM and will not be recycled by gc. To do so, wecan = aovid the costs of creating and recycling the objects = repeatedly.

The default = parameter "taskmanager.memory.preallocate" is false, that means these = managed memories will not be allocated during starting TaskManager. When = the job is running, the related tasks would request these managed = memories and then you will see the memory consumption is high. When the = job is cancelled, these managed memories will be released to the = MemoryManager but not recycled by gc, so you will see no changes in = memory consumption. After you restart the TaskManager, the initial = memory consumption is low because of lazy allocating = via taskmanager.memory.preallocate=3Dfalse.

Best,
Zhijiang
---------------------------------------------------------------= ---
=B7=A2=BC=FE=C8=CB=A3=BAPaul Lam <paullin3280@gmail.com>
=B7=A2=CB=CD=CA=B1=BC=E4=A3=BA2018=C4=EA10=D4=C217=C8=D5(=D0=C7= =C6=DA=C8=FD) 12:31
=CA=D5=BC=FE=C8=CB=A3=BAjpreisner <jpreisner@free.fr>
=B3=AD=A1=A1=CB=CD=A3= =BAuser <user@flink.apache.org>
=D6=F7=A1=A1=CC=E2=A3= =BARe: Need help to understand memory consumption


Hi Julien,

AFAIK, streaming jobs put data objects=  on heap, so the it depends on the=  JVM GC to release the memory. 

Best,
Paul Lam

> =D4=DA 2018=C4=EA10=D4=C212=C8=D5= =A3=AC14:29=A3=ACjpreisner@free.fr =D0=B4=B5=C0=A3=BA

> Hi,

> My use case is : 
> - I use Flink 1.4.1 in = standalone cluster with 5 VM (1 VM =3D&= nbsp;1 JobManager + 1 TaskManager)
> - I run N jobs per days= . N may vary (one day : N=3D20, an= other day : N=3D50, ...). All jobs are&= nbsp;the same. They connect to Kafka topics&= nbsp;and have two DB2 connector.
> - Depending on a special eve= nt, a job can self-restart via the comm= and : bin/flink cancel <JobID>
> - At the end of the day= , I cancel all jobs
> - Each VM is configured with=  16Gb RAM
> - Allocated memory configured for=  one taskmanager is 10Gb

> After several days, the memory&nb= sp;saturates (we exceed 14Gb of used memory)= .

> I read the following posts b= ut I did not succeed in understanding m= y problem :
> - https://flink.apache.org/news/2015/05/11/Juggling-with-Bits-and= -Bytes.html
> - http://mail-archives.apache.org/mod_mbox/flink-user/201711.mbox= /browser

> I did some tests on a m= achine (outside the cluster) with the top&nb= sp;command and this is what I concluded = ;(please see attached file - Flink_memory.PNG)&nb= sp;:
> - When a job is started = ;and running, it consumes memory
> - When a job is cancelled,&n= bsp;a large part of the memory is still=  used
> - When another job is starte= d and running (after to have cancel the=  previous job), even more memory is con= sumed
> - When I restart jobmanager = and taskmanager, memory returns to normal

> Why when a job is canceled,&= nbsp;the memory is not released?

> I added another attachment that&n= bsp;represents the graph of a job - Gra= ph.PNG.
> If it can be useful we = use MapFunction, FlatMapFunction, FilterFunction, trig= gers and windows, ...

> Thanks in advance,
> Julien<Flink_memory.xlsx><Graph.PNG><F= link_memory.PNG>


= --Apple-Mail=_2B6CF955-620E-4C61-8471-E0F90C67E889--