Mailing-List: contact dev-help@airavata.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@airavata.apache.org
MIME-Version: 1.0
In-Reply-To: <CAHX0R7Hd3=yFgj3uosDsQw9g+vPnTsAYDU+P5GyfgCNYgGk5-Q@mail.gmail.com>
References: <D8081195C7EF5A458FE73665B960FC348DF8CF5A@XMAIL-MBX-BC1.AD.UCSD.EDU>
 <CAGuEt4eOgaH3+Vu+rCFaYX-AuV1-Bzrqi9-VkLWoBfCr+_oXCA@mail.gmail.com> <CAHX0R7Hd3=yFgj3uosDsQw9g+vPnTsAYDU+P5GyfgCNYgGk5-Q@mail.gmail.com>
From: Mangirish Wagle <vaglomangirish@gmail.com>
Date: Fri, 28 Oct 2016 12:46:58 -0400
Message-ID: <CAGuEt4eqYhgaYOn=eZjsoc3kh9q=jPCNx=EdmqQ0gp-VS8te2Q@mail.gmail.com>
Subject: Re: mesos and moving jobs between clusters
To: dev@airavata.apache.org
Content-Type: multipart/alternative; boundary=001a114bda32489434053fef9b0e
archived-at: Fri, 28 Oct 2016 16:47:08 -0000

--001a114bda32489434053fef9b0e
Content-Type: text/plain; charset=UTF-8

Hi Pankaj,

I was curious to know what is your motivation to work on developing a
custom framework and not use Aurora or any existing robust frameworks. It
would be great if you could share some pointers on that.
I would also like to know what specific use cases you are targeting through
your framework, as well as what are various stability concerns that you may
have identified and how are you planning to handle them?

Regards,
Mangirish


On Tue, Oct 25, 2016 at 6:09 PM, Pankaj Saha <psaha4@binghamton.edu> wrote:

> Hi Mark,
>
> Mesos collects the resource information from all the nodes in the cluster
> (cores, memory, disk, and gpu) and presents a unified view, as if it is a
> single operating system. The Mesosphere, who a commercial entity for Mesos,
> has built an ecosystem around Mesos as the kernel called the "Data Center
> Operating System (DCOS)".  Frameworks interact  with Mesos to reserve
> resources and then use these resources to run jobs on the cluster. So, for
> example, if multiple frameworks such as Marathon, Apache Aurora, and a
> custom-MPI-framework are using Mesos, then there is a negotiation between
> Mesos and each framework on how many resources each framework gets. Once
> the framework, say Aurora, gets resources, it can decide how to use those
> resources. Some of the strengths of Mesos include fault tolerance at scale
> and the ability to co-schedule applications/frameworks on the cluster such
> that cluster utilization is high.
>
> Mesos off-the-shelf only works when the Mater and agent nodes have a line
> of communication to each other. We have worked on modifying the Mesos
> installation so that it even works when agents are behind firewalls on
> campus clusters. We are also working on getting the same setup to work on
> Jetstream and Chameleon where allocations are a mix of public IPs and
> internally accessible nodes. This will allow us to use Mesos to
> meta-schedule across clusters. We are also developing our own framework, to
> be able to customize scheduling and resource negotiations for science
> gateways on Mesos clusters. Our plan is to work with Suresh and Marlon's
> team so that it works with Airavata.
>
> I will be presenting at the Gateways workshop in November, and then I will
> also be at SC along with my adviser (Madhu Govindaraju), if you would like
> to discuss any of these projects.
>
> We are working on packaging our work so that it can be shared with this
> community.
>
>
> Thanks
>
> Pankaj
>
> On Tue, Oct 25, 2016 at 11:36 AM, Mangirish Wagle <
> vaglomangirish@gmail.com> wrote:
>
>> Hi Mark,
>>
>> Thanks for your question. So if I understand you correctly, you need kind
>> of load balancing between identical clusters through a single Mesos master?
>>
>> With the current setup, from what I understand, we have a separate mesos
>> masters for every cluster on separate clouds. However, its a good
>> investigative topic if we can have single mesos master targeting multiple
>> identical clusters. We have some work ongoing to use a virtual cluster
>> setup with compute resources across clouds to install mesos, but not sure
>> if that is what you are looking for.
>>
>> Regards,
>> Mangirish
>>
>>
>>
>>
>> On Tue, Oct 25, 2016 at 11:05 AM, Miller, Mark <mmiller@sdsc.edu> wrote:
>>
>>> Hi all,
>>>
>>>
>>>
>>> I posed a question to Suresh (see below), and he asked me to put this
>>> question on the dev list.
>>>
>>> So here it is. I will be grateful for any comments about the issues you
>>> all are facing, and what has come up in trying this, as
>>>
>>> It seems likely that this is a much simpler problem in concept than it
>>> is in practice, but its solution has many benefits.
>>>
>>>
>>>
>>> Here is my question:
>>>
>>> A group of us have been discussing how we might simplify submitting jobs
>>> to different compute resources in our current implementation of CIPRES, and
>>> how cloud computing might facilitate this. But none of us are cloud
>>> experts. As I understand it, the mesos cluster that I have been seeing in
>>> the Airavata email threads is intended to make it possible to deploy jobs
>>> to multiple virtual clusters. I am (we are) wondering if Mesos manages
>>> submissions to identical virtual clusters on multiple machines, and if that
>>> works efficiently.
>>>
>>>
>>>
>>> In our implementation, we have to change the rules to run efficiently on
>>> different machines, according to gpu availability, and cores per node. I am
>>> wondering how Mesos/ virtual clusters affect those considerations.
>>>
>>> Can mesos create basically identical virtual clusters independent of
>>> machine?
>>>
>>>
>>> Thanks for any advice.
>>>
>>>
>>>
>>> Mark
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>
>>
>

--001a114bda32489434053fef9b0e
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Hi Pankaj,<div><br></div><div>I was curious to know what i=
s your motivation to work on developing a custom framework and not use Auro=
ra or any existing robust frameworks. It would be great if you could share =
some pointers on that.</div><div>I would also like to know what specific us=
e cases you are targeting through your framework, as well as what are vario=
us stability concerns that you may have identified and how are you planning=
 to handle them?</div><div><br></div><div>Regards,</div><div>Mangirish</div=
><div><br><br><br></div><img width=3D"0" height=3D"0" class=3D"mailtrack-im=
g" src=3D"https://mailtrack.io/trace/mail/85d81445cb0fc823729456972000e7981=
e42b021.png?u=3D765734"></div><div class=3D"gmail_extra"><br><div class=3D"=
gmail_quote">On Tue, Oct 25, 2016 at 6:09 PM, Pankaj Saha <span dir=3D"ltr"=
>&lt;<a href=3D"mailto:psaha4@binghamton.edu" target=3D"_blank">psaha4@bing=
hamton.edu</a>&gt;</span> wrote:<br><blockquote class=3D"gmail_quote" style=
=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir=
=3D"ltr"><p dir=3D"ltr" style=3D"font-size:12.8px;line-height:1.38;margin-t=
op:0pt;margin-bottom:0pt"><span style=3D"font-size:12.6667px;font-family:ar=
ial;color:rgb(0,0,0);vertical-align:baseline;white-space:pre-wrap">Hi Mark,=
</span></p><div dir=3D"ltr" style=3D"font-size:12.8px;line-height:1.38;marg=
in-top:0pt;margin-bottom:0pt"><br></div><p dir=3D"ltr" style=3D"font-size:1=
2.8px;line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style=3D"fon=
t-size:12.6667px;font-family:arial;color:rgb(0,0,0);vertical-align:baseline=
;white-space:pre-wrap">Mesos collects the resource information from all the=
 nodes in the cluster (cores, memory, disk, and gpu) and presents a unified=
 view, as if it is a single operating system. The Mesosphere, who a commerc=
ial entity for Mesos, has built an ecosystem around Mesos as the kernel cal=
led the &quot;Data Center Operating System (DCOS)&quot;.=C2=A0 Frameworks i=
nteract =C2=A0with Mesos to reserve resources and then use these resources =
to run jobs on the cluster. So, for example, if multiple frameworks such as=
 Marathon, Apache Aurora, and a custom-MPI-framework are using Mesos, then =
there is a negotiation between Mesos and each framework on how many resourc=
es each framework gets. Once the framework, say Aurora, gets resources, it =
can decide how to use those resources. Some of the strengths of Mesos inclu=
de fault tolerance at scale and the ability to co-schedule applications/fra=
meworks on the cluster such that cluster utilization is high.</span></p><di=
v dir=3D"ltr" style=3D"font-size:12.8px;line-height:1.38;margin-top:0pt;mar=
gin-bottom:0pt"><br></div><p dir=3D"ltr" style=3D"font-size:12.8px;line-hei=
ght:1.38;margin-top:0pt;margin-bottom:0pt"><span style=3D"font-size:12.6667=
px;font-family:arial;color:rgb(0,0,0);vertical-align:baseline;white-space:p=
re-wrap">Mesos off-the-shelf only works when the Mater and agent nodes have=
 a line of communication to each other. We have worked on modifying the Mes=
os installation so that it even works when agents are behind firewalls on c=
ampus clusters. We are also working on getting the same setup to work on Je=
tstream and Chameleon where allocations are a mix of public IPs and interna=
lly accessible nodes. This will allow us to use Mesos to meta-schedule acro=
ss clusters. We are also developing our own framework, to be able to custom=
ize scheduling and resource negotiations for science gateways on Mesos clus=
ters. Our plan is to work with Suresh and Marlon&#39;s team so that it work=
s with Airavata. =C2=A0</span></p><div dir=3D"ltr" style=3D"font-size:12.8p=
x;line-height:1.38;margin-top:0pt;margin-bottom:0pt"><br></div><p dir=3D"lt=
r" style=3D"font-size:12.8px;line-height:1.38;margin-top:0pt;margin-bottom:=
0pt"><span style=3D"font-size:12.6667px;font-family:arial;color:rgb(0,0,0);=
vertical-align:baseline;white-space:pre-wrap">I will be presenting at the G=
ateways workshop in November, and then I will also be at SC along with my a=
dviser (Madhu Govindaraju), if you would like to discuss any of these proje=
cts.</span></p><div dir=3D"ltr" style=3D"font-size:12.8px;line-height:1.38;=
margin-top:0pt;margin-bottom:0pt"><br></div><p dir=3D"ltr" style=3D"font-si=
ze:12.8px;line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style=3D=
"font-size:12.6667px;font-family:arial;color:rgb(0,0,0);vertical-align:base=
line;white-space:pre-wrap">We are working on packaging our work so that it =
can be shared with this community.</span></p><p dir=3D"ltr" style=3D"font-s=
ize:12.8px;line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style=
=3D"font-size:12.6667px;font-family:arial;color:rgb(0,0,0);vertical-align:b=
aseline;white-space:pre-wrap"><br></span></p><p style=3D"font-size:12.8px;l=
ine-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style=3D"font-size:=
12.6667px;font-family:arial;color:rgb(0,0,0);vertical-align:baseline;white-=
space:pre-wrap">Thanks</span></p><span class=3D"HOEnZb"><font color=3D"#888=
888"><p style=3D"font-size:12.8px;line-height:1.38;margin-top:0pt;margin-bo=
ttom:0pt"><span style=3D"font-size:12.6667px;font-family:arial;color:rgb(0,=
0,0);vertical-align:baseline;white-space:pre-wrap">Pankaj</span></p></font>=
</span></div><div class=3D"HOEnZb"><div class=3D"h5"><div class=3D"gmail_ex=
tra"><br><div class=3D"gmail_quote">On Tue, Oct 25, 2016 at 11:36 AM, Mangi=
rish Wagle <span dir=3D"ltr">&lt;<a href=3D"mailto:vaglomangirish@gmail.com=
" target=3D"_blank">vaglomangirish@gmail.com</a>&gt;</span> wrote:<br><bloc=
kquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #cc=
c solid;padding-left:1ex"><div dir=3D"ltr">Hi Mark,<div><br></div><div>Than=
ks for your question. So if I understand you correctly, you need kind of lo=
ad balancing between identical clusters through a single Mesos master?</div=
><div><br></div><div>With the current setup, from what I understand, we hav=
e a separate mesos masters for every cluster on separate clouds. However, i=
ts a good investigative topic if we can have single mesos master targeting =
multiple identical clusters. We have some work ongoing to use a virtual clu=
ster setup with compute resources across clouds to install mesos, but not s=
ure if that is what you are looking for.</div><div><br></div><div>Regards,<=
/div><div>Mangirish</div><div><br><br><br></div><img width=3D"0" height=3D"=
0" class=3D"m_6347536031158752829m_5282766053251906094mailtrack-img" src=3D=
"data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIB=
RAA7"></div><div class=3D"m_6347536031158752829HOEnZb"><div class=3D"m_6347=
536031158752829h5"><div class=3D"gmail_extra"><br><div class=3D"gmail_quote=
">On Tue, Oct 25, 2016 at 11:05 AM, Miller, Mark <span dir=3D"ltr">&lt;<a h=
ref=3D"mailto:mmiller@sdsc.edu" target=3D"_blank">mmiller@sdsc.edu</a>&gt;<=
/span> wrote:<br><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8=
ex;border-left:1px #ccc solid;padding-left:1ex">


<div lang=3D"EN-US" link=3D"#0563C1" vlink=3D"#954F72">
<div class=3D"m_6347536031158752829m_5282766053251906094m_-1426213867195208=
495WordSection1">
<p class=3D"MsoNormal">Hi all,<u></u><u></u></p>
<p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p>
<p class=3D"MsoNormal">I posed a question to Suresh (see below), and he ask=
ed me to put this question on the dev list.<u></u><u></u></p>
<p class=3D"MsoNormal">So here it is. I will be grateful for any comments a=
bout the issues you all are facing, and what has come up in trying this, as=
<u></u><u></u></p>
<p class=3D"MsoNormal">It seems likely that this is a much simpler problem =
in concept than it is in practice, but its solution has many benefits.<u></=
u><u></u></p>
<p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p>
<p class=3D"MsoNormal">Here is my question:<u></u><u></u></p>
<p class=3D"MsoNormal">A group of us have been discussing how we might simp=
lify submitting jobs to different compute resources in our current implemen=
tation of CIPRES, and how cloud computing might facilitate this. But none o=
f us are cloud experts. As I understand
 it, the mesos cluster that I have been seeing in the Airavata email thread=
s is intended to make it possible to deploy jobs to multiple virtual cluste=
rs. I am (we are) wondering if Mesos manages submissions to identical virtu=
al clusters on multiple machines,
 and if that works efficiently.<span class=3D"m_6347536031158752829m_528276=
6053251906094m_-1426213867195208495apple-converted-space">=C2=A0</span><u><=
/u><u></u></p>
<p class=3D"MsoNormal">=C2=A0<u></u><u></u></p>
<p class=3D"MsoNormal">In our implementation, we have to change the rules t=
o run efficiently on different machines, according to gpu availability, and=
 cores per node. I am wondering how Mesos/ virtual clusters affect those co=
nsiderations.<u></u><u></u></p>
<p class=3D"MsoNormal">Can mesos create basically identical virtual cluster=
s independent of machine?<u></u><u></u></p>
<p class=3D"MsoNormal"><br>
Thanks for any advice.<span class=3D"m_6347536031158752829m_528276605325190=
6094HOEnZb"><font color=3D"#888888"><u></u><u></u></font></span></p><span c=
lass=3D"m_6347536031158752829m_5282766053251906094HOEnZb"><font color=3D"#8=
88888">
<p class=3D"MsoNormal">=C2=A0<u></u><u></u></p>
<p class=3D"MsoNormal">Mark<u></u><u></u></p>
<p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p>
<p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p>
<p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p>
<p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p>
</font></span></div>
</div>

</blockquote></div><br></div>
</div></div></blockquote></div><br></div>
</div></div></blockquote></div><br></div>

--001a114bda32489434053fef9b0e--