airavata-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Renan DelValle <rdelv...@binghamton.edu>
Subject Re: Mesos based meta-scheduling for Airavata
Date Sat, 12 Nov 2016 21:10:37 GMT
I haven't gotten to do that unfortunately. It's on my to-do list for my own
client.

Either way, I think you might get better info if you ask on one of the
Aurora mailing lists.

-Renan

On Thu, Oct 27, 2016 at 5:36 PM, Shenoy, Gourav Ganesh <goshenoy@indiana.edu
> wrote:

> *@Renan*,
>
>
>
> I had a question – what is the default thrift port for aurora scheduler,
> which uses TBinaryProtocol?
>
>
>
> I have installed Aurora-0.16 scheduler/executor on the Mesos-1.0.1
> cluster, and only been able to use the THttpClient over TJSONProtocol (port
> 8081). Aurora site mentions that they have enabled TBinaryProtocol for 0.16
> version, but somehow I am not able to find the binary port. It would be
> great if you could provide some guidance here.
>
>
>
> Thanks and Regards,
>
> Gourav Shenoy
>
>
>
> *From: *Renan DelValle <rdelval1@binghamton.edu>
> *Reply-To: *"dev@airavata.apache.org" <dev@airavata.apache.org>
> *Date: *Thursday, October 27, 2016 at 4:31 PM
> *To: *Suresh Marru <smarru@apache.org>
> *Cc: *Airavata Dev <dev@airavata.apache.org>, Madhusudhan Govindaraju <
> mgovinda@binghamton.edu>
>
> *Subject: *Re: Mesos based meta-scheduling for Airavata
>
>
>
> I wish I had the bandwidth to help with this. I'll do my best to answer
> any pointed questions (if there are any) on the Aurora irc/slack chat.
>
> -Renan
>
>
>
> On Oct 17, 2016 11:38 PM, "Suresh Marru" <smarru@apache.org> wrote:
>
> Hi Renan,
>
>
>
> Since you did a similar exercise using Go [1], it will be nice to see your
> feedback and guidance on the discussions Gourav is summarizing below.
>
>
>
> Suresh
>
>
>
> [1] - http://markmail.org/thread/ymj7yqvvbhrjwv3s
>
>
>
> On Oct 17, 2016, at 11:32 PM, Shenoy, Gourav Ganesh <goshenoy@indiana.edu>
> wrote:
>
>
>
> Hi dev,
>
>
>
> Now that I have been able to get jobs scheduled via Aurora, I thought I
> should summarize my understanding. I would also like to briefly draw out
> the plan which I am working on with respect to using Mesos with Airavata.
>
>
>
> *Apache Aurora:*
>
>
>
> ·         Aurora, similar to Marathon & Chronos, is a service scheduler
> framework for Mesos. It has been built for scheduling long running services
> & cron jobs on Mesos.
>
> ·         The advantage with Aurora (over Marathon & Chronos) is that it
> works well for one-off jobs as well – i.e. If I want to run a job and get
> the output, Aurora is a better fit than Marathon & Chronos, since Marathon
> will never let the job exit (and keep restarting it on slaves) & Chronos is
> ONLY for crons.
>
> ·         Aurora also allows fine grained control of the jobs that need
> to be submitted – the concept of jobs, tasks, processes – a job can consist
> of one or more tasks, and a task can consist of one or more processes.
>
> ·         Aurora manages jobs that are made up of tasks; Mesos manages
> the tasks that consist of processes; Thermos (is the Aurora executor)
> manages the processes.
>
> ·         We can control resource utilization at task level because of
> the above job abstractions that Aurora provides.
>
> ·         Among many other features, a useful one is the resource-quota
> management for users & the ability to support multiple users to run jobs.
>
>
>
> *Current focus:*
>
>
>
> ·         I am currently working on building a Thrift based client for
> Aurora, and have been successful in implementing one, but with limited
> operations.
>
> ·         I will be adding support for more operations keeping them
> aligned to Airavata job submission/monitoring requirements.
>
> ·         I am currently focusing on targeting Airavata deployment to
> Mesos on a single cluster (eg: AWS). The flow would look like follows:
>
> <image001.png>
>
> ·         As you can see, currently there is just a single Mesos cluster.
> The future focus would be to expand this to have multiple clusters.
>
>
>
> *Subsequent work:*
>
> ·         Once we are able to test Airavata deployment to single cluster
> successfully, we can expand this to a multi-cluster environment.
>
> ·         Here we would multiple Mesos clusters which would somehow need
> to be managed. But, the overall flow would look like follows:
>
> <image002.png>
>
>
>
> ·         We can either have multiple Mesos masters (for each individual
> cluster), that are connected to each other via VPN, or have a single master
> – in which case we would need to consider all other nodes as slaves.
>
> ·         This is a design issue which needs discussion, and Suresh has
> some ideas on how to do this.
>
>
>
> Thanks and Regards,
>
> Gourav Shenoy
>
>
>
> *From: *Suresh Marru <smarru@apache.org>
> *Reply-To: *"dev@airavata.apache.org" <dev@airavata.apache.org>
> *Date: *Friday, October 7, 2016 at 11:43 PM
> *To: *Airavata Dev <dev@airavata.apache.org>
> *Subject: *Re: Mesos based meta-scheduling for Airavata
>
>
>
> Hi Gourav,
>
>
>
> Thank you for the nice informative summaries, posts like these are always
> educational. Keep’em coming.
>
>
>
> Suresh
>
>
>
> On Oct 7, 2016, at 10:56 PM, Shenoy, Gourav Ganesh <goshenoy@indiana.edu>
> wrote:
>
>
>
> Hi dev,
>
>
>
> I have been exploring different frameworks for Mesos which would help our
> use-case of providing Airavata the capability to run jobs in a Mesos based
> ecosystem. In particular, I have been playing around with Marathon &
> Chronos and I am now going to be working on Apache Aurora.
>
>
>
> I have summarized my understanding about Mesos, Marathon & Chronos below.
> I will send out a separate email about Aurora later.
>
>
>
> *Apache Mesos:*
>
>
>
> ·         Apache Mesos is an open-source cluster manager, in the sense
> that it helps deploy & manage different frameworks (or applications) in a
> large clustered environment easily.
>
> ·         Mesos provides the ability to utilize underlying shared pool of
> nodes as a single compute unit – That is, it can run many applications on
> these nodes efficiently.
>
> ·         Mesos uses the concept of “offers” for scheduling and running
> jobs on the underlying nodes. When a framework (application) wants to run
> computations/jobs on the cluster, Mesos will decide how many resources it
> will “offer” that framework based on the availability. The framework will
> then decide which resources to use from the offer, and subsequently run the
> computation/job on that resource.
>
> ·         In a typical cluster, you will have 3 or more Mesos masters &
> multiple Mesos slaves. Multiple mesos masters help in providing high
> availability – if one master goes down, Mesos will reelect a new leader
> (master) – using Zookeeper.
>
> ·         The task mentioned above of providing “offers” to frameworks is
> done by a master, whereas the slaves are the ones who run these
> computations.
>
>
>
> ·         Some additional points:
>
> o    I built a Mesos cluster with 3 masters & 2 slaves on EC2.
>
> o    Each master & slave have 1GB of RAM & 1vCPU with 20GB of disk space.
>
>
>
> *Marathon:*
>
>
>
> ·         Marathon is considered a framework that runs on top of Mesos.
> It is a container orchestration platform for Mesos and essentially acts as
> a service scheduler.
>
> ·         It is named “marathon” because it is intended for long running
> applications. That is, Marathon makes sure that the service it is running
> never stops – if a service goes down or the slave on which the service is
> run dies, marathon keeps re-starting it on different slaves.
>
> ·         In some sense Marathon is very good for ensuring high
> availability of services. That is, instead of running services directly on
> Mesos, run it in Marathon if you never want it to die.
> *Note*: You can decide to run a service on multiple slave nodes and if
> resources on these slaves are available, Mesos will “offer” them to
> Marathon.
>
> ·         It is called a container orchestration platform because it
> “launches” these services inside a container – either Docker OR Mesos
> container.
>
> ·         In my opinion it is not a suitable “job scheduler” for Airavata
> because in Airavata we need to run a job and get the output rather than
> keeping it running always. Instead, we can run other schedulers –
> chronos/aurora as a service in Marathon.
>
> *Chronos:*
>
>
>
> ·         Chronos is a Cron scheduler for Mesos. It is good for running
> scheduled jobs – jobs that need to be run for a certain number of times,
> repeatedly after certain intervals.
>
> ·         Chronos also provides the ability to add dependencies between
> jobs – That is, if a job1 is dependent on another job2 then it will run
> job1 first and then run job2 after job1 completes. It also builds a
> Directed Acyclic Graph (DAG) based on these dependencies.
>
> ·         Similar to Marathon, Chronos receives “offers” from Mesos
> master whenever it needs to run a job on Mesos.
>
> ·         Again, I found that Chronos does not fit the Airavata use-case
> since I could not find a way to run one-off jobs via Chronos – you need to
> specify interval time for Chronos, & Chronos then re-runs the job after
> that interval is complete (even if you decide to specify num. of
> repetitions=1).
>
>
>
>
>
> Some additional points:
>
> ·         Marathon & Chronos both have REST API support – eg: you can
> submit jobs via APIs along with other interactions such as list jobs, etc.
>
> ·         I installed Marathon & Chronos frameworks on the Mesos master
> nodes. This is how their health looks like on the Mesos dashboard:
>
>
>
> <image002.png>
>
>                 As you can see, there are 3 active tasks running in
> Chronos & 4 active tasks (long running) in Marathon.
>
>
>
> ·         I also installed Chronos as a service inside Marathon, and this
> is how it looks like in the Marathon UI:
>
> <image004.png>
>
> Interestingly, Chronos (as a service in Marathon) was smart enough to
> identify the jobs submitted via Chronos (as a framework on Mesos) &
> vice-versa.
>
>
>
> ·         Also, Mesos dashboard lists the active tasks it is running &
> details about which slave the task is running on. It also lists Completed
> tasks. The “Sandbox” gives you access to the stdout/stderr files for the
> tasks as well as any other directories that were created as part of the
> task.
>
> <image005.png>
>
>
>
> Pardon me for this long email. Next, I will explore Apache Aurora which
> seems a better fit for Airavata use-case because it provides the features
> that Chronos supports, as well as can run one-off jobs.
>
>
>
> Thanks and Regards,
>
> Gourav Shenoy
>
>
>
> *From: *"Shenoy, Gourav Ganesh" <goshenoy@indiana.edu>
> *Reply-To: *"dev@airavata.apache.org" <dev@airavata.apache.org>
> *Date: *Friday, September 23, 2016 at 4:43 PM
> *To: *"dev@airavata.apache.org" <dev@airavata.apache.org>
> *Subject: *Mesos based meta-scheduling for Airavata
>
>
>
> Hi Dev,
>
>
>
> I am working on this project of building a Mesos based meta-scheduler for
> Airavata, along with Shameera & Mangirish. Here is the jira link:
> https://issues.apache.org/jira/browse/AIRAVATA-2082.
>
>
>
> ·         We have identified some tasks that would be needed for
> achieving this, and at the higher level it would consist of:
>
> 1.      Resource provisioning – We need to provision resources on cloud &
> hpc infrastructures such as EC2, Jetstream, Comet, etc.
>
> 2.      Building a cluster – Deploying a Mesos cluster on set of nodes
> obtained from (1) above for task management.
>
> 3.      Selecting a scheduler – We need to investigate the scheduler to
> use with Mesos cluster. Some of the options are Marathon, Aurora. But we
> need to find one that suits our needs of running serial as well as parallel
> (MPI) jobs.
>
> 4.      Installing & running applications on this cluster – Once the
> cluster has been deployed and a scheduler choice made, we need to be able
> to install and run applications on this cluster using Airavata.
>
>
>
> ·         Until now we were able to look into the following:
>
> o   Resource provisioning:
>
> §  We explored several options of provisioning resources – using cloud
> libraries as well as via ansible scripts.
>
> §  We built a OpenStack4J Java module which would provision instances on
> OpenStack based clouds (eg: Jetstream).
>
> §  We also built a CloudBridge Python module for provisioning EC2
> instances on Amazon. CloudBridge can also be used to provision instances on
> OpenStack
>
> §  We wrote Ansible scripts for bringing up instances on both AWS and
> OpenStack based clouds.
>
>
>
> §  *Key Points*: CloudBridge, OpenStack4J are powerful libraries for
> resource provisioning, but currently they do single-instance provisioning,
> and not support templated boot options such as CloudFormation (for AWS) &
> Heat (for OpenStack).
>
>
>
> o   Building a cluster:
>
> §  We wrote Ansible script for deploying a Mesos-Marathon cluster on a
> set of nodes. This script will install necessary dependencies such as
> Zookeeper.
>
> §  We tested this on OpenStack based clouds & on EC2.
>
> §  OpenStack Magnum provides excellent support for doing resource
> provisioning & deploying mesos cluster, but we are running into some
> problems while trying it.
>
>
>
> o   Installing a scheduler:
>
> §  Our Ansible script is currently installing Marathon as the scheduler
> on Mesos. We haven’t yet submitted jobs using Marathon.
>
>
>
> ·         Although not finalized, but we are inclined towards using
> Ansible approach for the above, as Ansible also provides Python APIs and
> which will allow us to integrate it with Airavata via Thrift. Hence we will
> be able to easily invoke the Ansible scripts from code without needing to
> use the command-line interface.
>
>
>
> ·         We are also progressively working on some work-items such as:
>
> o   Exploring options to provision and deploy a Mesos-Marathon cluster on
> HPC systems such as Comet. The challenge would be to use Ansible to
> provision resources and deploy the cluster. Once we have a cluster, we can
> try running applications.
>
> o   Exploring different scheduler options for running serial and parallel
> (MPI) jobs on such heterogeneous clusters.
>
> o   Exploring orchestration options such as OpenStack Heat, AWS
> CloudFormation, OpenStack Magnum, etc.
>
>
>
> Any suggestions and comments are highly appreciated.
>
>
>
> Thanks and Regards,
>
> Gourav Shenoy
>
>
>
>
>
>

Mime
View raw message