Mailing-List: contact user-help@flink.apache.org; run by ezmlm
Precedence: bulk
MIME-Version: 1.0
In-Reply-To: <475D836E-9FF9-4ABE-962D-F6B4E4A0EF0C@gmail.com>
References: <CADoFZRurR6+WoM5MYgCUu0KQ5+qAF3zCHkAFwsGOim06bvXBWA@mail.gmail.com>
 <CADoFZRtjgWWjtejJq+x5hirA6utW2VkauK=EFnm_=9jYD_arxw@mail.gmail.com>
 <etPan.59d744ca.619bb159.1e8f@apache.org> <CAC27z=MmaeveBeK8Hu-tTvo89Gn7PRLae1QVp2JxmK7qS5MsaA@mail.gmail.com>
 <475D836E-9FF9-4ABE-962D-F6B4E4A0EF0C@gmail.com>
From: Till Rohrmann <trohrmann@apache.org>
Date: Mon, 9 Oct 2017 17:12:08 +0200
Message-ID: <CAC27z=MuaX8ame+=9Z2RN=JK0cjvuD5Scz3fwJ+P0FkOsjGfxA@mail.gmail.com>
Subject: Re: Consult about flink on mesos cluster
To: yubo <yubo1983@gmail.com>
Cc: "Tzu-Li (Gordon) Tai" <tzulitai@apache.org>, user <user@flink.apache.org>,
	Eron Wright <ewright@live.com>
Content-Type: multipart/alternative; boundary="001a113b4a38a71d5f055b1e9f7b"
archived-at: Mon, 09 Oct 2017 15:12:58 -0000

--001a113b4a38a71d5f055b1e9f7b
Content-Type: text/plain; charset="UTF-8"

Hi Bo,

you can still use Flink with Marathon, because Marathon will only schedule
the cluster entrypoint which is the MesosApplicationMasterRunner.
Everything else will be scheduled via Fenzo. Moreover, by using Marathon
you gain high availability because Marathon makes sure that the
ApplicationMaster is restarted in case of a failure.

Cheers,
Till

On Mon, Oct 9, 2017 at 2:59 PM, yubo <yubo1983@gmail.com> wrote:

> Thanks for your reply, Till
> We will use without Marathon, and hope the PR is merged to latest version
> soon.
>
> Best regards,
> Bo
>
> On Oct 9, 29 Heisei, at 6:36 PM, Till Rohrmann <trohrmann@apache.org>
> wrote:
>
> Hi Bo,
>
> Flink uses internally Fenzo to match tasks and offers. Fenzo does not
> support the Marathon constraints syntax you are referring to. At the
> moment, Flink only allows to define hard host attribute constraints which
> means that you define a host attribute which has to match exactly. Fenzo
> also supports constraints that work on a set of tasks [1], but this is not
> yet exposed to the user. With that you should be able to evenly spread your
> tasks across multiple machines.
>
> There is actually a PR [2] trying to add this functionality. However, it
> is not yet in the shape to be merged.
>
> [1] https://github.com/Netflix/Fenzo/wiki/Constraints#constraints-that-
> operate-on-groups-of-tasks
> [2] https://github.com/apache/flink/pull/4628
>
> Cheers,
> Till
>
> On Fri, Oct 6, 2017 at 10:54 AM, Tzu-Li (Gordon) Tai <tzulitai@apache.org>
> wrote:
>
>> Hi Bo,
>>
>> I'm not familiar with Mesos deployments, but I'll forward this to Till or
>> Eron (in CC) who perhaps could provide some help here.
>>
>> Cheers,
>> Gordon
>>
>>
>> On 2 October 2017 at 8:49:32 PM, Bo Yu (yubo1983@gmail.com) wrote:
>>
>> Hello all,
>> This is Bo, I met some problems when I tried to use flink in my mesos
>> cluster (1 master, 2 slaves (cpu has 32 cores)).
>> I tried to start the mesos-appmaster.sh in marathon, the job manager is
>> started without problem.
>>
>> mesos-appmaster.sh -Djobmanager.heap.mb=1024 -Dtaskmanager.heap.mb=1024
>> -Dtaskmanager.numberOfTaskSlots=32
>>
>> My problem is the task managers are all located in one single slave.
>> 1. (log1)
>> The initial tasks in "/usr/local/flink/conf/flink-conf.yaml" is setted
>> as "mesos.initial-tasks: 2"
>> And also set the "mesos.constraints.hard.hostattribute: rack:ak09-27",
>> which is the master node of mesos cluster.
>>
>> 2. (log2)
>> I tried many ways to distribute the tasks to all the available slaves,
>> and without any success.
>> So I decide to try add a group_by operator which I referenced from
>> https://mesosphere.github.io/marathon/docs/constraints.html
>> "mesos.constraints.hard.hostattribute: rack:ak09-27,GROUP_BY:2"
>> According to the log, flink keep waiting for more offers and the tasks
>> never been launched.
>>
>> Sorry, I am a newbie to flink, also on mesos. Please reply if my problem
>> is not clear, and I will be appreciate on any hint about how to distribute
>> task evenly on available resources.
>>
>> Thank you in advance.
>>
>> Best regards,
>>
>> Bo
>>
>>
>
>

--001a113b4a38a71d5f055b1e9f7b
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Hi Bo,<div><br></div><div>you can still use Flink with Mar=
athon, because Marathon will only schedule the cluster entrypoint which is =
the MesosApplicationMasterRunner. Everything else will be scheduled via Fen=
zo. Moreover, by using Marathon you gain high availability because Marathon=
 makes sure that the ApplicationMaster is restarted in case of a failure.</=
div><div><br></div><div>Cheers,</div><div>Till</div></div><div class=3D"gma=
il_extra"><br><div class=3D"gmail_quote">On Mon, Oct 9, 2017 at 2:59 PM, yu=
bo <span dir=3D"ltr">&lt;<a href=3D"mailto:yubo1983@gmail.com" target=3D"_b=
lank">yubo1983@gmail.com</a>&gt;</span> wrote:<br><blockquote class=3D"gmai=
l_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left=
:1ex"><div style=3D"word-wrap:break-word"><span class=3D""><div>Thanks for =
your reply, Till</div><div>We will use without Marathon, and hope the PR is=
 merged to latest version soon.</div><div>=C2=A0</div><div>Best regards,</d=
iv><div>Bo</div><div><br></div></span><div><blockquote type=3D"cite"><span =
class=3D""><div>On Oct 9, 29 Heisei, at 6:36 PM, Till Rohrmann &lt;<a href=
=3D"mailto:trohrmann@apache.org" target=3D"_blank">trohrmann@apache.org</a>=
&gt; wrote:</div><br class=3D"m_-5243765311733136992Apple-interchange-newli=
ne"></span><div><div class=3D"h5"><div><div dir=3D"ltr">Hi Bo,<div><br></di=
v><div>Flink uses internally Fenzo to match tasks and offers. Fenzo does no=
t support the Marathon constraints syntax you are referring to. At the mome=
nt, Flink only allows to define hard host attribute constraints which means=
 that you define a host attribute which has to match exactly. Fenzo also su=
pports constraints that work on a set of tasks [1], but this is not yet exp=
osed to the user. With that you should be able to evenly spread your tasks =
across multiple machines.</div><div><br></div><div>There is actually a PR [=
2] trying to add this functionality. However, it is not yet in the shape to=
 be merged.</div><div><br></div><div>[1]=C2=A0<a href=3D"https://github.com=
/Netflix/Fenzo/wiki/Constraints#constraints-that-operate-on-groups-of-tasks=
" target=3D"_blank">https://github.com/<wbr>Netflix/Fenzo/wiki/<wbr>Constra=
ints#constraints-that-<wbr>operate-on-groups-of-tasks</a></div><div>[2]=C2=
=A0<a href=3D"https://github.com/apache/flink/pull/4628" target=3D"_blank">=
https://github.com/apache/<wbr>flink/pull/4628</a></div><div><br></div><div=
>Cheers,</div><div>Till</div></div><div class=3D"gmail_extra"><br><div clas=
s=3D"gmail_quote">On Fri, Oct 6, 2017 at 10:54 AM, Tzu-Li (Gordon) Tai <spa=
n dir=3D"ltr">&lt;<a href=3D"mailto:tzulitai@apache.org" target=3D"_blank">=
tzulitai@apache.org</a>&gt;</span> wrote:<br><blockquote class=3D"gmail_quo=
te" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"=
><div style=3D"word-wrap:break-word"><div id=3D"m_-5243765311733136992m_-37=
87273057188022807bloop_customfont" style=3D"font-family:Helvetica,Arial;fon=
t-size:13px;margin:0px">Hi Bo,</div><div id=3D"m_-5243765311733136992m_-378=
7273057188022807bloop_customfont" style=3D"font-family:Helvetica,Arial;font=
-size:13px;margin:0px"><div id=3D"m_-5243765311733136992m_-3787273057188022=
807bloop_customfont" style=3D"margin:0px"><br></div><div id=3D"m_-524376531=
1733136992m_-3787273057188022807bloop_customfont" style=3D"margin:0px">I=
9;m not familiar with Mesos deployments, but I&#39;ll forward this to Till =
or Eron (in CC) who perhaps could provide some help here.</div><div id=3D"m=
_-5243765311733136992m_-3787273057188022807bloop_customfont" style=3D"margi=
n:0px"><br></div><div id=3D"m_-5243765311733136992m_-3787273057188022807blo=
op_customfont" style=3D"margin:0px">Cheers,</div><div id=3D"m_-524376531173=
3136992m_-3787273057188022807bloop_customfont" style=3D"margin:0px">Gordon<=
/div></div><div><div class=3D"m_-5243765311733136992h5"> <br> <div class=3D=
"m_-5243765311733136992m_-3787273057188022807bloop_sign" id=3D"m_-524376531=
1733136992m_-3787273057188022807bloop_sign_1507280028237554176"></div> <br>=
<p class=3D"m_-5243765311733136992m_-3787273057188022807airmail_on">On 2 Oc=
tober 2017 at 8:49:32 PM, Bo Yu (<a href=3D"mailto:yubo1983@gmail.com" targ=
et=3D"_blank">yubo1983@gmail.com</a>) wrote:</p> <blockquote type=3D"cite" =
class=3D"m_-5243765311733136992m_-3787273057188022807clean_bq"><span><div><=
div></div><div>


<div dir=3D"ltr">
<div class=3D"gmail_quote">
<div dir=3D"ltr">
<div>
<div>
<div>Hello all,<br></div>
This is Bo, I met some problems when I tried to use flink in my
mesos cluster (1 master, 2 slaves (cpu has 32 cores)).<br></div>
</div>
<div>I tried to start the mesos-appmaster.sh in marathon, the job
manager is started without problem.</div>
<div><br></div>
<div><span style=3D"color:rgb(246,246,246);font-family:&quot;Source Sans Pr=
o&quot;,&quot;Helvetica Neue&quot;,Helvetica,Arial,sans-serif;font-size:14p=
x;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;=
font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;t=
ext-transform:none;white-space:normal;word-spacing:0px;background-color:rgb=
(35,41,55);text-decoration-style:initial;text-decoration-color:initial;disp=
lay:inline;float:none">
mesos-appmaster.sh</span> <span style=3D"color:rgb(246,246,246);font-family=
:&quot;Source Sans Pro&quot;,&quot;Helvetica Neue&quot;,Helvetica,Arial,san=
s-serif;font-size:14px;font-style:normal;font-variant-ligatures:normal;font=
-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:st=
art;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px=
;background-color:rgb(35,41,55);text-decoration-style:initial;text-decorati=
on-color:initial;display:inline;float:none">
-Djobmanager.heap.mb=3D1024 -Dtaskmanager.heap.mb=3D1024
-Dtaskmanager.numberOfTaskSlot<wbr>s=3D32</span></div>
<br>
My problem is the task managers are all located in one single
slave.<br>
<div>1. (log1)<br>
The initial tasks in &quot;/usr/local/flink/conf/flink-c<wbr>onf.yaml&quot;
is setted as &quot;mesos.initial-tasks: 2&quot;<br></div>
And also set the &quot;mesos.constraints.hard.hostat<wbr>tribute:
rack:ak09-27&quot;, which is the master node of mesos cluster.<br>
<br>
<div>
<div>
<div>
<div>2. (log2)<br></div>
<div>I tried many ways to distribute the tasks to all the available
slaves, and without any success.</div>
<div>So I decide to try add a group_by operator which I referenced
from <a href=3D"https://mesosphere.github.io/marathon/docs/constraints.html=
" target=3D"_blank">https://mesosphere.github.io/m<wbr>arathon/docs/constra=
ints.html</a><br>
</div>
<div>&quot;mesos.constraints.hard.hostat<wbr>tribute:
rack:ak09-27,GROUP_BY:2&quot;</div>
<div>According to the log, flink keep waiting for more offers and
the tasks never been launched.</div>
<div><br></div>
<div>Sorry, I am a newbie to flink, also on mesos. Please reply if
my problem is not clear, and I will be appreciate on any hint about
how to distribute task evenly on available resources.</div>
<div><br></div>
<div>Thank you in advance.<br></div>
<div><br></div>
<div>Best regards,</div>
<div><br></div>
<div>Bo<br></div>
</div>
</div>
</div>
</div>
</div>
<br></div>


</div></div></span></blockquote></div></div></div></blockquote></div><br></=
div>
</div></div></div></blockquote></div><br></div></blockquote></div><br></div=
>

--001a113b4a38a71d5f055b1e9f7b--