aurora-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erb, Stephan" <>
Subject Re: Speeding up Aurora client job creation
Date Thu, 12 Feb 2015 09:15:30 GMT
Hi Hussein,

we also had slight performance problems when talking to Aurora. We ended up using the existing
python client directly in our code (see This allowed
us to reuse the api object and its scheduler connection, dropping a connection latency of
about 0.3-0.4 seconds per request.

Best Regards,
From: Bill Farner <>
Sent: Wednesday, February 11, 2015 9:29 PM
Subject: Re: Speeding up Aurora client job creation

To reduce that time you will indeed want to talk directly to the
scheduler.  This will definitely require you to roll up your sleeves a bit
and set up a thrift client to our api (based on api.thrift [1]), since you
will need to specify your tasks in a format that the thermos executor can
understand.  Turns out this is JSON data, so it should not be *too*

However, there is another technical limitation you will hit for the
submission rate you are after.  The scheduler is backed by a durable store
whose write latency is at minimum the amount of time required to fsync.



On Wed, Feb 11, 2015 at 11:46 AM, Hussein Elgridly <> wrote:

> Hi folks,
> I'm looking at a use cases that involves submitting potentially hundreds of
> jobs a second to our Mesos cluster. My tests show that the aurora client is
> taking 1-2 seconds for each job submission, and that I can run about four
> client processes in parallel before they peg the CPU at 100%. I need more
> throughput than this!
> Squashing jobs down to the Process or Task level doesn't really make sense
> for our use case. I'm aware that with some shenanigans I can batch jobs
> together using job instances, but that's a lot of work on my current
> timeframe (and of questionable utility given that the jobs certainly won't
> have identical resource requirements).
> What I really need is (at least) an order of magnitude speedup in terms of
> being able to submit jobs to the Aurora scheduler (via the client or
> otherwise).
> Conceptually it doesn't seem like adding a job to a queue should be a thing
> that takes a couple of seconds, so I'm baffled as to why it's taking so
> long. As an experiment, I wrapped the call to client.execute() in
> in cProfile and called aurora job create with a very
> simple test job.
> Results of the profile are in the Gist below:
> Our of a 0.977s profile time, the two things that stick out to me are:
> 1. 0.526s spent in Pystachio for a job that doesn't use any templates
> 2. 0.564s spent in create_job, presumably talking to the scheduler (and
> setting up the machinery for doing so)
> I imagine I can sidestep #1 with a check for "{{" in the job file and
> bypass Pystachio entirely. Can I also skip the Aurora client entirely and
> talk directly to the scheduler? If so what does that entail, and are there
> any risks associated?
> Thanks,
> -Hussein
> Hussein Elgridly
> Senior Software Engineer, DSDE
> The Broad Institute of MIT and Harvard
View raw message