aurora-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erb, Stephan" <Stephan....@blue-yonder.com>
Subject Re: Speeding up Aurora client job creation
Date Thu, 12 Feb 2015 09:15:30 GMT
Hi Hussein,

we also had slight performance problems when talking to Aurora. We ended up using the existing
python client directly in our code (see apache.aurora.client.api.__init__.py). This allowed
us to reuse the api object and its scheduler connection, dropping a connection latency of
about 0.3-0.4 seconds per request.

Best Regards,
Stephan
________________________________________
From: Bill Farner <wfarner@apache.org>
Sent: Wednesday, February 11, 2015 9:29 PM
To: dev@aurora.incubator.apache.org
Subject: Re: Speeding up Aurora client job creation

To reduce that time you will indeed want to talk directly to the
scheduler.  This will definitely require you to roll up your sleeves a bit
and set up a thrift client to our api (based on api.thrift [1]), since you
will need to specify your tasks in a format that the thermos executor can
understand.  Turns out this is JSON data, so it should not be *too*
prohibitive.

However, there is another technical limitation you will hit for the
submission rate you are after.  The scheduler is backed by a durable store
whose write latency is at minimum the amount of time required to fsync.

[1]
https://github.com/apache/incubator-aurora/blob/master/api/src/main/thrift/org/apache/aurora/gen/api.thrift

-=Bill

On Wed, Feb 11, 2015 at 11:46 AM, Hussein Elgridly <
hussein@broadinstitute.org> wrote:

> Hi folks,
>
> I'm looking at a use cases that involves submitting potentially hundreds of
> jobs a second to our Mesos cluster. My tests show that the aurora client is
> taking 1-2 seconds for each job submission, and that I can run about four
> client processes in parallel before they peg the CPU at 100%. I need more
> throughput than this!
>
> Squashing jobs down to the Process or Task level doesn't really make sense
> for our use case. I'm aware that with some shenanigans I can batch jobs
> together using job instances, but that's a lot of work on my current
> timeframe (and of questionable utility given that the jobs certainly won't
> have identical resource requirements).
>
> What I really need is (at least) an order of magnitude speedup in terms of
> being able to submit jobs to the Aurora scheduler (via the client or
> otherwise).
>
> Conceptually it doesn't seem like adding a job to a queue should be a thing
> that takes a couple of seconds, so I'm baffled as to why it's taking so
> long. As an experiment, I wrapped the call to client.execute() in
> client.py:proxy_main in cProfile and called aurora job create with a very
> simple test job.
>
> Results of the profile are in the Gist below:
>
> https://gist.github.com/helgridly/b37a0d27f04a37e72bb5
>
> Our of a 0.977s profile time, the two things that stick out to me are:
>
> 1. 0.526s spent in Pystachio for a job that doesn't use any templates
> 2. 0.564s spent in create_job, presumably talking to the scheduler (and
> setting up the machinery for doing so)
>
> I imagine I can sidestep #1 with a check for "{{" in the job file and
> bypass Pystachio entirely. Can I also skip the Aurora client entirely and
> talk directly to the scheduler? If so what does that entail, and are there
> any risks associated?
>
> Thanks,
> -Hussein
>
> Hussein Elgridly
> Senior Software Engineer, DSDE
> The Broad Institute of MIT and Harvard
>
Mime
View raw message