airavata-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shenoy, Gourav Ganesh" <goshe...@indiana.edu>
Subject Re: Apache Aurora Scheduler APIs (Thrift)
Date Wed, 19 Oct 2016 18:08:09 GMT
Hi Mangirish,

I have already shared the details of this setup with you. This cluster is the same one I used
to test Mesos-Marathon-Chronos. Let me know if you face any problems or have any questions.

I have setup an aurora client machine for running command-line tests, and you can also install
aurora-cli on your local machine if needed. I will share client machine details with you on
HipChat.

Thanks and Regards,
Gourav Shenoy

From: Mangirish Wagle <vaglomangirish@gmail.com>
Reply-To: "dev@airavata.apache.org" <dev@airavata.apache.org>
Date: Tuesday, October 18, 2016 at 2:02 AM
To: "dev@airavata.apache.org" <dev@airavata.apache.org>
Subject: Re: Apache Aurora Scheduler APIs (Thrift)

Hi Gourav,
This is great. Can I get access to this setup to see if I can potentially submit a 1 node
MPI job through this setup?
I would also want to try out submitting jobs in batch, which may be potentially used for "gang
scheduling" to run MPI jobs.
Thanks and Regards,
Mangirish

On Mon, Oct 17, 2016 at 9:39 PM, Shenoy, Gourav Ganesh <goshenoy@indiana.edu<mailto:goshenoy@indiana.edu>>
wrote:
Hi dev,

I was able to successfully build a “test” Thrift client for the Apache Aurora scheduler
running on the Mesos cluster I deployed (on Ec2). I call it a “test” client since it is
not completely ready, and right now only performs the following operations:

1.       Submit a one-off job to Aurora scheduler.

2.       Monitor the status of the job submitted – the thrift apis allow us to also check
if there are any pending jobs, and what is the reason for it being in PENDING state. This
helps us to know if there are insufficient resources (eg: CPUs) and provision new ones if
needed.

3.       Retrieve list of running jobs.

Some details:


•         About the thrift client

o    I cloned the Apache Aurora repository and it contained the "api.thrift" file, which contained
the RPC structures we need for the client.

o    I generated client stubs from this "api.thrift" file. I used the "thrift-maven plugin"
for generating the Java classes; With this plugin, it directly creates a JAR with all thrift-generated-classes,
and this can be used as a library/dependency in our client project.

o    I initially tried connecting to the scheduler via "TSocket" transport connection, and
spent a lot of time figuring out why this failed. Apparently, the current installation of
Aurora only exposes an HTTP client (at port 8081).

o    I had to use a "THTTPClient" (instead of TSocket), and use TJSONProtocol (instead of
TBinaryProtocol). But I will be dropping an email in the Aurora mailing list to find out how
to enable binary socket connection.


•         About operations implemented


o    Submit a one-off job

•  I was able to submit a job to Aurora, which then schedules it to run on Mesos.

•  A job in Aurora is uniquely identified by 3 parameters (collectively called as Job Key)
– environment name (eg: devel), role (eg: centos), job name (eg: hello_world).

•  A typical Job would look like: "example/centos/devel/hello_world", where example is the
name of our Mesos cluster.

•  To submit a job, we need to know the resources it needs (cpus, ram, disk), and include
it in a task config – which will also contain the command to run the application along with
other details.

•  The job submitted via the thrift client was running successfully on Mesos.


o    Monitor status of job submitted

•  I submitted 2 jobs – one with sufficient resource requirements, and another with a
larger resource requirement (which is insufficient on Mesos).

•  The first job ran fine, whereas the second couldn’t be scheduled since there were insufficient
resources.

•  I was able to get the status of the active job, and also the status of the PENDING job,
with reason for why it is PENDING. The response received for the PENDING job is:
PendingReason(taskId:centos-devel-hello_pending-0-1cabf9d3-d315-4bd9-bf1c-8121f4801084, reason:Insufficient:
CPU)


o    Retrieve a list of running jobs

•  The response contains a rich amount of information about the job.

•  Sample parsed response:
# instanceCount: 1
   >> Job Key <<
         # name: hello_world
         # role: centos
         # environment: devel
   >> Identity <<
         # owner: centos
   >> Task Config <<
         # numCPUs: 0.1
         # diskMb: 8
         # ramMb: 1

# priority: 0

Next Steps:


•         Complete implementation for all functions relevant to Airavata job submission/monitoring.

•         Dynamically add slaves based on health of jobs/cluster.

•         Find out how to enable socket based communication using binary protocol with Aurora
Scheduler on our cluster.

Thanks and Regards,
Gourav Shenoy

From: "Shenoy, Gourav Ganesh" <goshenoy@indiana.edu<mailto:goshenoy@indiana.edu>>
Reply-To: "dev@airavata.apache.org<mailto:dev@airavata.apache.org>" <dev@airavata.apache.org<mailto:dev@airavata.apache.org>>
Date: Friday, October 14, 2016 at 11:04 PM
To: "dev@airavata.apache.org<mailto:dev@airavata.apache.org>" <dev@airavata.apache.org<mailto:dev@airavata.apache.org>>
Subject: Apache Aurora Scheduler APIs (Thrift)

Hi dev,

I am working with building a Thrift client for Apache Aurora Scheduler running on a Mesos
cluster. Apparently, the Apache Aurora documentation provided very little information about
the Thrift APIs that Aurora exposed. One way to get to know what services are exposed - is
by going through the "api.thrift" file on Aurora github (https://github.com/apache/aurora/blob/master/api/src/main/thrift/org/apache/aurora/gen/api.thrift).
 Reading through that file to figure out the APIs can be daunting.

I have installed Aurora on a Mesos cluster on EC2 to carry out tests, and the UI dashboard
for Aurora provides a wide range of useful information. On the dashboard they have provided
a link "Scheduler API" which gives a comprehensive list of all Thrift services/APIs that the
Aurora scheduler exposes. I think this is very useful for anyone who plans to write a client.

I have taken a dump of this html and loaded it on S3: https://s3-us-west-2.amazonaws.com/apache-aurora/thrift_module_api.htm
for reference.

Snapshot:
[cid:image001.png@01D22A12.37D96E20]

Thanks and Regards,
Gourav Shenoy



Mime
View raw message