aurora-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bryan Helmkamp <br...@codeclimate.com>
Subject Re: Suitibility of Aurora for one-time tasks
Date Wed, 26 Feb 2014 20:45:22 GMT
Thanks, Kevin. The idea of always-on workers of varying sizes is
effectively what we have right now in our non-Mesos world. The problem
is that sometimes we end up with not enough workers for certain
classes of jobs (e.g. High Memory), while part of the cluster sits
idle.

Conceptually, in my mind we would define approximately a dozen Tasks,
one for each type of work we need to perform (with different resource
requirements), and then run Jobs, each with a Task and a unique
payload, but I don't think this model works with Mesos. It seems we'd
need to create a unique Task for every Job.

-Bryan

On Wed, Feb 26, 2014 at 3:35 PM, Kevin Sweeney <kevints@apache.org> wrote:
> A job is a group of nearly-identical tasks plus some constraints like rack
> diversity. The scheduler considers each task within a job equivalently
> schedulable, so you can't vary things like resource footprint. It's
> perfectly fine to have several jobs with just a single task, as long as
> each has a different job key (which is (role, environment, name)).
>
> Another approach is to have a bunch of uniform always-on workers (in
> different sizes). This can be expressed as a Service like so:
>
> # workers.aurora
> class Profile(Struct):
>   queue_name = Required(String)
>   resources = Required(Resources)
>   instances = Required(Integer)
>
> HIGH_MEM = Resources(cpu = 8.0, ram = 32 * GB, disk = 64 * GB)
> HIGH_CPU = Resources(cpu = 16.0, ram = 4 * GB, disk = 64 * GB)
>
> work_forever = Process(name = 'work_forever',
>   cmdline = '''
>     # TODO: Replace this with something that isn't pseudo-bash
>     while true; do
>       work_item=`take_from_work_queue {{profile.queue_name}}`
>       do_work "$work_item"
>       tell_work_queue_finished "{{profile.queue_name}}" "$work_item"
>     done
>   ''')
>
> task = Task(processes = [work_forever],
> *  resources = '{{profile.resources}}, # Note this is static per
> queue-name.*
> )
>
> service = Service(
>   task = task,
>   cluster = 'west',
>   role = 'service-account-name',
>   environment = 'prod',
>   name = '{{profile.queue_name}}_processor'
>   *instances = '{{profile.instances}}', # Scale here.*
> )
>
> jobs = [
>   service.bind(profile = Profile(
>     resources = HIGH_MEM,
>     queue_name = 'graph_traversals',
>     instances = 50,
>   )),
>   service.bind(profile = Profile(
>     resources = HIGH_CPU,
>     queue_name = 'compilations',
>     instances = 200,
>   )),
> ]
>
>
> On Wed, Feb 26, 2014 at 11:46 AM, Bryan Helmkamp <bryan@codeclimate.com>wrote:
>
>> Thanks, Bill.
>>
>> Am I correct in understanding that is not possible to parameterize
>> individual Jobs, just Tasks? Therefore, since I don't know the job
>> definitions up front, I will have parameterized Task templates, and
>> generate a new Task every time I need to run a Job?
>>
>> Is that the recommended route?
>>
>> Our work is very non-uniform so I don't think work-stealing would be
>> efficient for us.
>>
>> -Bryan
>>
>> On Wed, Feb 26, 2014 at 12:49 PM, Bill Farner <wfarner@apache.org> wrote:
>> > Thanks for checking out Aurora!
>> >
>> > My short answer is that Aurora should handle thousands of short-lived
>> > tasks/jobs per day without trouble.  (If you proceed with this approach
>> and
>> > encounter performance issues, feel free to file tickets!)  The DSL does
>> > have some mechanisms for parameterization.  In your case since you
>> probably
>> > don't know all the job definitions upfront, you'll probably want to
>> > parameterize with environment variables.  I don't see this described in
>> our
>> > docs, but you there's a little detail at the option declaration [1].
>> >
>> > Another approach worth considering is work-stealing, using a single job
>> as
>> > your pool of workers.  I would find this easier to manage, but it would
>> > only be suitable if your work items are sufficiently-uniform.
>> >
>> > Feel free to continue the discussion!  We're also pretty active in our
>> IRC
>> > channel if you'd prefer that medium.
>> >
>> >
>> > [1]
>> >
>> https://github.com/apache/incubator-aurora/blob/master/src/main/python/apache/aurora/client/options.py#L170-L183
>> >
>> >
>> > -=Bill
>> >
>> >
>> > On Tue, Feb 25, 2014 at 10:11 PM, Bryan Helmkamp <bryan@codeclimate.com
>> >wrote:
>> >
>> >> Hello,
>> >>
>> >> I am considering Aurora for a key component of our infrastructure.
>> >> Awesome work being done here.
>> >>
>> >> My question is: How suitable is Aurora for running short-lived tasks?
>> >>
>> >> Background: We (Code Climate) do static analysis of tens of thousands
>> >> of repositories every day. We run a variety of forms of analysis, with
>> >> heterogeneous resource requirements, and thus our interest in Mesos.
>> >>
>> >> Looking at Aurora, a lot of the core features look very helpful to us.
>> >> Where I am getting hung up is figuring out how to model short-lived
>> >> tasks as tasks/jobs. Long-running resource allocations are not really
>> >> an option for us due to the variation in our workloads.
>> >>
>> >> My first thought was to create a Task for each type of analysis we
>> >> run, and then start a new Job with the appropriate Task every time we
>> >> want to run analysis (regulated by a queue). This doesn't seem to work
>> >> though. I can't `aurora create` the same `.aurora` file multiple times
>> >> with different Job names (as far as I can tell). Also there is the
>> >> problem of how to customize each Job slightly (e.g. a payload).
>> >>
>> >> An obvious alternative is to create a unique Task every time we want
>> >> to run work. This would result in tens of thousands of tasks being
>> >> created every day, and from what I can tell Aurora does not intend to
>> >> be used like that. (Please correct me if I am wrong.)
>> >>
>> >> Basically, I would like to hook my job queue up to Aurora to perform
>> >> the actual work. There are a dozen different types of jobs, each with
>> >> different performance requirements. Every time a job runs, it has a
>> >> unique payload containing the definition of the work it should be
>> >> performed.
>> >>
>> >> Can Aurora be used this way? If so, what is the proper way to model
>> >> this with respect to Jobs and Tasks?
>> >>
>> >> Any/all help is appreciated.
>> >>
>> >> Thanks!
>> >>
>> >> -Bryan
>> >>
>> >> --
>> >> Bryan Helmkamp, Founder, Code Climate
>> >> bryan@codeclimate.com / 646-379-1810 / @brynary
>> >>
>>
>>
>>
>> --
>> Bryan Helmkamp, Founder, Code Climate
>> bryan@codeclimate.com / 646-379-1810 / @brynary
>>



-- 
Bryan Helmkamp, Founder, Code Climate
bryan@codeclimate.com / 646-379-1810 / @brynary

Mime
View raw message