Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 76912200D1E for ; Wed, 18 Oct 2017 15:52:07 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 74E0F160BEA; Wed, 18 Oct 2017 13:52:07 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 69AD81609EE for ; Wed, 18 Oct 2017 15:52:06 +0200 (CEST) Received: (qmail 15828 invoked by uid 500); 18 Oct 2017 13:52:05 -0000 Mailing-List: contact user-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list user@flink.apache.org Received: (qmail 15812 invoked by uid 99); 18 Oct 2017 13:52:05 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 18 Oct 2017 13:52:05 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 68E871A2B68 for ; Wed, 18 Oct 2017 13:52:04 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.279 X-Spam-Level: * X-Spam-Status: No, score=1.279 tagged_above=-999 required=6.31 tests=[HTML_MESSAGE=2, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id lGOSc1e7IPXA for ; Wed, 18 Oct 2017 13:52:03 +0000 (UTC) Received: from dispatch1-eu1.ppe-hosted.com (dispatch1-eu1.ppe-hosted.com [62.209.50.28]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id DBD835F5F7 for ; Wed, 18 Oct 2017 13:52:02 +0000 (UTC) Received: from pure.maildistiller.com (unknown [10.70.45.254]) by dispatch1-eu1.ppe-hosted.com (Proofpoint Essentials ESMTP Server) with ESMTP id 057C120059 for ; Wed, 18 Oct 2017 13:52:02 +0000 (UTC) X-Virus-Scanned: Proofpoint Essentials engine Received: from mx1-eu1.ppe-hosted.com (unknown [10.70.45.163]) by pure.maildistiller.com (Proofpoint Essentials ESMTP Server) with ESMTPS id 6E2CC8004D for ; Wed, 18 Oct 2017 13:52:01 +0000 (UTC) Received: from mailhost.exch.int.mgtcore.net (unknown [83.98.0.225]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1-eu1.ppe-hosted.com (Proofpoint Essentials ESMTP Server) with ESMTPS id 514798006E for ; Wed, 18 Oct 2017 13:52:01 +0000 (UTC) Received: from TH-BEEXCH02.KIT.CORP ([fe80::1076:8681:b2c:33ab]) by TH-FEEXCH02.KIT.CORP ([127.0.0.1]) with mapi id 14.03.0294.000; Wed, 18 Oct 2017 14:52:00 +0100 From: Manuel Montesino To: "user@flink.apache.org" CC: Product-Flow Subject: Problems with taskmanagers in Mesos Cluster Thread-Topic: Problems with taskmanagers in Mesos Cluster Thread-Index: AdNIFvXQ/sZndM65TfigC3zXHbDoiQ== Date: Wed, 18 Oct 2017 13:52:00 +0000 Message-ID: Accept-Language: es-ES, en-GB, en-US Content-Language: es-ES X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [83.98.1.118] Content-Type: multipart/alternative; boundary="_000_E42C9A4D1D706944994DB71BFAC94793061E972BTHBEEXCH02KITCO_" MIME-Version: 1.0 X-MDID: 1508334721-BvNjEMul3CjF archived-at: Wed, 18 Oct 2017 13:52:07 -0000 --_000_E42C9A4D1D706944994DB71BFAC94793061E972BTHBEEXCH02KITCO_ Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Hi, We have deployed a Mesos cluster with Marathon, we deploy flink sessions th= rough marathon with multiple taskmanagers configured. Some times in previou= s stages usually change configuration on marathon json about memory and oth= er stuff, but when redeploy the flink session the jobmanagers stop and star= t with new configuration, but the taskmanagers not reuse the same was confi= gured. So we have to kill/stop the dockers of each taskmanager task. There is a way that kill or stop the taskmanagers when the session is redep= loyed? Some environment configuration from marathon json file related to taskmanag= ers: ``` "flink_akka.ask.timeout": "1min", "flink_akka.framesize": "102400k", "flink_high-availability": "zookeeper", "flink_high-availability.zookeeper.path.root": "/flink", "flink_jobmanager.web.history": "200", "flink_mesos.failover-timeout": "86400", "flink_mesos.initial-tasks": "16", "flink_mesos.maximum-failed-tasks": "-1", "flink_mesos.resourcemanager.tasks.container.type": "docker", "flink_mesos.resourcemanager.tasks.mem": "6144", "flink_metrics.reporters": "jmx", "flink_metrics.reporter.jmx.class": "org.apache.flink.metrics.jmx.JMXReport= er", "flink_state.backend": "org.apache.flink.contrib.streaming.state.RocksDBSta= teBackendFactory", "flink_taskmanager.maxRegistrationDuration": "10 min", "flink_taskmanager.network.numberOfBuffers": "8192", "flink_jobmanager.heap.mb": "768", "flink_taskmanager.debug.memory.startLogThread": "true", "flink_mesos.resourcemanager.tasks.cpus": "1.3", "flink_env.java.opts.taskmanager": "-XX:+UseG1GC -XX:MaxGCPauseMillis=3D200= -XX:ConcGCThreads=3D1 -XX:InitiatingHeapOccupancyPercent=3D35 -XX:G1HeapRe= gionSize=3D16M -XX:MinMetaspaceFreeRatio=3D50 -XX:MaxMetaspaceFreeRatio=3D8= 0 -XX:+DisableExplicitGC -Djava.awt.headless=3Dtrue -XX:+PrintGCDetails -XX= :+PrintGCDateStamps -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=3D5 -X= X:GCLogFileSize=3D10M", "flink_containerized.heap-cutoff-ratio": "0.67" ``` Thanks in advance and kind regards, Manuel Montesino Devops Engineer E manuel.montesino@piksel(dot)com Marie Curie,1. Ground Floor. Campanillas, Malaga 29590 liberating viewing | piksel.com [Piksel_Email.png] This message is private and confidential. If you have received this message= in error, please notify the sender or servicedesk@piksel.com and remove it= from your system. Piksel Inc is a company registered in the United States, 2100 Powers Ferry = Road SE, Suite 400, Atlanta, GA 30339 --_000_E42C9A4D1D706944994DB71BFAC94793061E972BTHBEEXCH02KITCO_ Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable
Hi,

We have deployed a Mesos cluster with Marathon, we deploy flink sessions th= rough marathon with multiple taskmanagers configured. Some times in previou= s stages usually change configuration on marathon json about memory and oth= er stuff, but when redeploy the flink session the jobmanagers stop and start with new configuration, but t= he taskmanagers not reuse the same was configured. So we have to kill/stop = the dockers of each taskmanager task.

There is a way that kill or stop the taskmanagers when the session is redep= loyed?

Some environment configuration from marathon json file related to taskmanag= ers:

```
"flink_akka.ask.timeout": "1min",
"flink_akka.framesize": "102400k",
"flink_high-availability": "zookeeper",
"flink_high-availability.zookeeper.path.root": "/flink"= ,
"flink_jobmanager.web.history": "200",
"flink_mesos.failover-timeout": "86400",
"flink_mesos.initial-tasks": "16",
"flink_mesos.maximum-failed-tasks": "-1",
"flink_mesos.resourcemanager.tasks.container.type": "docker&= quot;,
"flink_mesos.resourcemanager.tasks.mem": "6144",
"flink_metrics.reporters": "jmx",
"flink_metrics.reporter.jmx.class": "org.apache.flink.metric= s.jmx.JMXReporter",
"flink_state.backend": "org.apache.flink.contrib.streaming.s= tate.RocksDBStateBackendFactory",
"flink_taskmanager.maxRegistrationDuration": "10 min",<= br> "flink_taskmanager.network.numberOfBuffers": "8192", "flink_jobmanager.heap.mb": "768",
"flink_taskmanager.debug.memory.startLogThread": "true"= ,
"flink_mesos.resourcemanager.tasks.cpus": "1.3",
"flink_env.java.opts.taskmanager": "-XX:+UseG1GC -XX:Max= GCPauseMillis=3D200 -XX:ConcGCThreads=3D1 -XX:InitiatingHeapOccupancyPercen= t=3D35 -XX:G1HeapRegionSize=3D16M -XX:MinMetaspaceFreeRatio=3D50 -XX:MaxMet= aspaceFreeRatio=3D80 -XX:+DisableExplicitGC -Djava.awt.headless=3Dtrue -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+UseGCLogFileRo= tation -XX:NumberOfGCLogFiles=3D5 -XX:GCLogFileSize=3D10M",
"flink_containerized.heap-cutoff-ratio": "0.67"
```

Thanks in advance and kind regards,

Manuel Montesino
Devops Engineer
= <= font color=3D"#333333" size=3D"2" face=3D"Arial,sans-serif">
= E manuel.montesino@piksel(dot)com

=
Marie Curie,1. Ground Floor. Campanillas, Malaga 29590

= liberating viewingpiksel.com

3D"Piksel_Email.png"

This message is private and = confidential. If you have received this message in error, please notify the= sender or servicedesk@piksel.com and remove it from your system.

Piksel Inc is a company regi= stered in the United States, 2100 Powers Ferry Road SE, Suite 400, Atlanta,= GA 30339

--_000_E42C9A4D1D706944994DB71BFAC94793061E972BTHBEEXCH02KITCO_--