From user-return-21425-archive-asf-public=cust-asf.ponee.io@flink.apache.org  Thu Jul 19 04:52:24 2018
Return-Path: <user-return-21425-archive-asf-public=cust-asf.ponee.io@flink.apache.org>
X-Original-To: archive-asf-public@cust-asf.ponee.io
Delivered-To: archive-asf-public@cust-asf.ponee.io
Received: from mail.apache.org (hermes.apache.org [140.211.11.3])
	by mx-eu-01.ponee.io (Postfix) with SMTP id 6627C180636
	for <archive-asf-public@cust-asf.ponee.io>; Thu, 19 Jul 2018 04:52:23 +0200 (CEST)
Received: (qmail 5732 invoked by uid 500); 19 Jul 2018 02:52:22 -0000
Mailing-List: contact user-help@flink.apache.org; run by ezmlm
Precedence: bulk
List-Help: <mailto:user-help@flink.apache.org>
List-Unsubscribe: <mailto:user-unsubscribe@flink.apache.org>
List-Post: <mailto:user@flink.apache.org>
List-Id: <user.flink.apache.org>
Delivered-To: mailing list user@flink.apache.org
Received: (qmail 5722 invoked by uid 99); 19 Jul 2018 02:52:22 -0000
Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142)
    by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 19 Jul 2018 02:52:22 +0000
Received: from localhost (localhost [127.0.0.1])
	by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 99FC5CB4DB
	for <user@flink.apache.org>; Thu, 19 Jul 2018 02:52:21 +0000 (UTC)
X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org
X-Spam-Flag: NO
X-Spam-Score: 3.183
X-Spam-Level: ***
X-Spam-Status: No, score=3.183 tagged_above=-999 required=6.31
	tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1,
	HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01,
	RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001, T_DKIMWL_WL_MED=-0.01,
	URIBL_BLOCKED=0.001, URI_HEX=1.313] autolearn=disabled
Authentication-Results: spamd1-us-west.apache.org (amavisd-new);
	dkim=pass (2048-bit key) header.d=gmail.com
Received: from mx1-lw-eu.apache.org ([10.40.0.8])
	by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024)
	with ESMTP id TctgGSsLIq0j for <user@flink.apache.org>;
	Thu, 19 Jul 2018 02:52:20 +0000 (UTC)
Received: from mail-oi0-f52.google.com (mail-oi0-f52.google.com [209.85.218.52])
	by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id C181A5F180
	for <user@flink.apache.org>; Thu, 19 Jul 2018 02:52:19 +0000 (UTC)
Received: by mail-oi0-f52.google.com with SMTP id w126-v6so12726510oie.7
        for <user@flink.apache.org>; Wed, 18 Jul 2018 19:52:19 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20161025;
        h=mime-version:in-reply-to:references:from:date:message-id:subject:to
         :cc;
        bh=BsUw6tmhffwZc+zkLreGqaDjw5CaOqzzgsQ5keCwigI=;
        b=melIcuLO36HEcEA9C4jXMKkPeT1X0ZXIQDOro9IZ+My4A81sU8G7ZxRS2AeujxKtab
         hFQV2VTW2KDX0ZcCuox+PbFNtz+P7iwGhqYjyeJuUYMK/Hs8f/Pyk36b8hCNWQVYDEjT
         eLHQb1aNTjsFsqnu8YCWzV59Lk4lqLCPJh+xlpnyju92LTTeWeGEKflkrUjLV1Hj2oNM
         ukLedbHCvz69rOXzC0fly+VaaleOw0kBVp9WUb3M4KG3Rrxxf9uTry/juMVZo5RknUkS
         RYp8Lr7R92TPCQynt3n+70VuTIOXs0hkSklb34MAJpNnR7N2OxMKDq5wqUQRqBYaqzk7
         UWPw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:mime-version:in-reply-to:references:from:date
         :message-id:subject:to:cc;
        bh=BsUw6tmhffwZc+zkLreGqaDjw5CaOqzzgsQ5keCwigI=;
        b=GJMQmObZL45+rWMSXE6y+T+ELGBAyZmMP07uFkaNOFARJkGb9PafdfDa9eGCsqQjGs
         D9Q0GRU/EqIpvo4zW5pDVed+M1prhU4flGtftYMAe6bxnkgR7CN3zzX8LLR7LXQPGDJV
         7pVyp3Od+DMkjaWaOFgXF+B1EPfJPSvcJroXq0Gb6PvVle9hinyG37uBIeWuxVylGmoz
         bGXF+WQt8s7LFLmLhwKWbHiy1px78p0FGqhIXiAoEFY3Q3piT9Xmu1DP7WigmLnmqGTr
         A/Q8vUyYNsxRpct6iwAkeqWeJTVaKlDXkr2KYbNnZSt/8dAWyrzXxqQKvREX8Tj9O+Yx
         YAcw==
X-Gm-Message-State: AOUpUlHFRztfrOMHxa3B0k8KjUjzQc8bAjIM1XXGKlLgYarIhP7Ei12z
	MhKPwyt3b8jqbadeWueBNxGQJuf1F+Dkpl1xG34=
X-Google-Smtp-Source: AAOMgpebTxtMApM4cEyBPp/0WAxFz7eEH+YYY9oKNStpi3IEcstEiTca6kOmPX3/+UvQQuZNd2qLtQoDmSOWNhKG7HA=
X-Received: by 2002:aca:b641:: with SMTP id g62-v6mr9946189oif.71.1531968738404;
 Wed, 18 Jul 2018 19:52:18 -0700 (PDT)
MIME-Version: 1.0
Received: by 2002:aca:eb42:0:0:0:0:0 with HTTP; Wed, 18 Jul 2018 19:52:17
 -0700 (PDT)
In-Reply-To: <CAMq=OU52N6Ka2aN1aQ_p_m09hrDHUUZj31jT2DRSyBFfXfGk0A@mail.gmail.com>
References: <CAFJBTF=q99i4Zxy-ipoNLb3Ai=UfQ5iYAwb5XKB4kAXmb2Vmag@mail.gmail.com>
 <1531914835152-0.post@n4.nabble.com> <CAMq=OU52N6Ka2aN1aQ_p_m09hrDHUUZj31jT2DRSyBFfXfGk0A@mail.gmail.com>
From: Vishal Santoshi <vishal.santoshi@gmail.com>
Date: Wed, 18 Jul 2018 22:52:17 -0400
Message-ID: <CAMq=OU4gZbueDxBBVzZoehJty=UqNbqCsUAc7uP0i-HMmQiRuA@mail.gmail.com>
Subject: Re: Flink 1.5 job distribution over cluster nodes
To: scarmeli <carmeli.dev@gmail.com>
Cc: user <user@flink.apache.org>
Content-Type: multipart/alternative; boundary="00000000000075ac7505715144b5"

--00000000000075ac7505715144b5
Content-Type: text/plain; charset="UTF-8"

For example state size / logs etc all these are now one one physical node (
or couple rather then spread out )  for that one pipe where we desire to
have a large state.  We have decreased the slots ( pretty artificial a set
up ) to give a node less stress. I wish there was a RFC from the wider
community on this, IMveryHO.

On Wed, Jul 18, 2018 at 9:43 AM, Vishal Santoshi <vishal.santoshi@gmail.com>
wrote:

> I think there is something to be said about making this distribution more
> flexible.  A stand alone cluster, still the distribution mechanism for many
> a installations suffers horribly with the above approach. A healthy cluster
> requires resources wot be used equitably is possible. I have some pipes
> that are CPU intense and some not so much. Slots being the primary unit of
> parallelism does not differentiate that much between a stress inducing pipe
> or other wise and thus having them spread out becomes essential to avoid
> skews.
>
> Maybe a simple setting could have forked different approaches to slot
> distribution. standalone = true for example.  I have a n node cluster with
> 2 nodes being heavily stressed ( LA approaching the slot size ) b'coz of
> this new setup and that does not seem right. Ot is still a physical node
> that that these process run on with local drives and so on.
>
>
>
>
>
>
> On Wed, Jul 18, 2018, 7:54 AM scarmeli <carmeli.dev@gmail.com> wrote:
>
>> Answered in another mailing list
>>
>> Hi Shachar,
>>
>> with Flink 1.5 we added resource elasticity. This means that Flink is now
>> able to allocate new containers on a cluster management framework like
>> Yarn
>> or Mesos. Due to these changes (which also apply to the standalone mode),
>> Flink no longer reasons about a fixed set of TaskManagers because if
>> needed
>> it will start new containers (does not work in standalone mode).
>> Therefore,
>> it is hard for the system to make any decisions about spreading slots
>> belonging to a single job out across multiple TMs. It gets even harder
>> when
>> you consider that some jobs like yours might benefit from such a strategy
>> whereas others would benefit from co-locating its slots. It gets even
>> more
>> complicated if you want to do scheduling wrt to multiple jobs which the
>> system does not have full knowledge about because they are submitted
>> sequentially. Therefore, Flink currently assumes that slots requests can
>> be
>> fulfilled by any TaskManager.
>>
>> Cheers,
>> Till
>>
>>
>>
>> --
>> Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.
>> nabble.com/
>>
>

--00000000000075ac7505715144b5
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">For example state size / logs etc all these are now one on=
e physical node ( or couple rather then spread out )=C2=A0 for that one pip=
e where we desire to have a large state.=C2=A0 We have decreased the slots =
( pretty artificial a set up ) to give a node less stress. I wish there was=
 a RFC from the wider community on this, IMveryHO.</div><div class=3D"gmail=
_extra"><br><div class=3D"gmail_quote">On Wed, Jul 18, 2018 at 9:43 AM, Vis=
hal Santoshi <span dir=3D"ltr">&lt;<a href=3D"mailto:vishal.santoshi@gmail.=
com" target=3D"_blank">vishal.santoshi@gmail.com</a>&gt;</span> wrote:<br><=
blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px=
 #ccc solid;padding-left:1ex"><div dir=3D"ltr"><div>I think there is someth=
ing to be said about making this distribution more flexible.=C2=A0 A stand =
alone cluster, still the distribution mechanism for many a installations su=
ffers horribly with the above approach. A healthy cluster requires resource=
s wot be used equitably is possible. I have some pipes that are CPU intense=
 and some not so much. Slots being the primary unit of parallelism does not=
 differentiate that much between a stress inducing pipe or other wise and t=
hus having them spread out becomes essential to avoid skews.</div><div><br>=
</div><div>Maybe a simple setting could have forked different approaches to=
 slot distribution. standalone =3D true for example.=C2=A0 I have a n node =
cluster with 2 nodes being heavily stressed ( LA approaching the slot size =
) b&#39;coz of this new setup and that does not seem right. Ot is still a p=
hysical node that that these process run on with local drives and so on.</d=
iv><div><div class=3D"h5"><div><br></div><div><br></div><div><br></div><div=
><br></div><div><br></div><div dir=3D"auto"><div dir=3D"auto"></div><br><di=
v class=3D"gmail_quote"><div dir=3D"ltr">On Wed, Jul 18, 2018, 7:54 AM scar=
meli &lt;<a href=3D"mailto:carmeli.dev@gmail.com" rel=3D"noreferrer norefer=
rer" target=3D"_blank">carmeli.dev@gmail.com</a>&gt; wrote:<br></div><block=
quote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc=
 solid;padding-left:1ex">Answered in another mailing list<br>
<br>
Hi Shachar, <br>
<br>
with Flink 1.5 we added resource elasticity. This means that Flink is now <=
br>
able to allocate new containers on a cluster management framework like Yarn=
 <br>
or Mesos. Due to these changes (which also apply to the standalone mode), <=
br>
Flink no longer reasons about a fixed set of TaskManagers because if needed=
 <br>
it will start new containers (does not work in standalone mode). Therefore,=
 <br>
it is hard for the system to make any decisions about spreading slots <br>
belonging to a single job out across multiple TMs. It gets even harder when=
 <br>
you consider that some jobs like yours might benefit from such a strategy <=
br>
whereas others would benefit from co-locating its slots. It gets even more =
<br>
complicated if you want to do scheduling wrt to multiple jobs which the <br=
>
system does not have full knowledge about because they are submitted <br>
sequentially. Therefore, Flink currently assumes that slots requests can be=
 <br>
fulfilled by any TaskManager. <br>
<br>
Cheers, <br>
Till <br>
<br>
<br>
<br>
--<br>
Sent from: <a href=3D"http://apache-flink-user-mailing-list-archive.2336050=
.n4.nabble.com/" rel=3D"noreferrer noreferrer noreferrer noreferrer" target=
=3D"_blank">http://apache-flink-user-maili<wbr>ng-list-archive.2336050.n4.<=
wbr>nabble.com/</a><br>
</blockquote></div></div>
</div></div></div>
</blockquote></div><br></div>

--00000000000075ac7505715144b5--