aurora-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kevin Sweeney <kevi...@apache.org>
Subject Re: Suitibility of Aurora for one-time tasks
Date Thu, 27 Feb 2014 02:04:56 GMT
And after a bit of code spelunking the semantics you want already exist
(just undocumented). Updated the ticket to update the documentation.


On Wed, Feb 26, 2014 at 6:00 PM, Kevin Sweeney <kevints@apache.org> wrote:

> The example I gave is somewhat syntactically invalid due to coding via
> email, but that's more or less what the interface will look like. I also
> filed https://issues.apache.org/jira/browse/AURORA-236 for more
> first-class support of the semantics I think you want (though currently you
> can fake it by setting max_failures to a very high number).
>
>
> On Wed, Feb 26, 2014 at 5:33 PM, Bryan Helmkamp <bryan@codeclimate.com>wrote:
>
>> Thanks, Kevin. That pretty much looks like exactly what I need.
>>
>> -Bryan
>>
>> On Wed, Feb 26, 2014 at 8:16 PM, Kevin Sweeney <kevints@apache.org>
>> wrote:
>> > For a more dynamic approach to resource utilization you can use
>> something
>> > like this:
>> >
>> > # dynamic.aurora
>> > *# Enqueue each individual work-item with aurora create -E
>> > work_item=$work_item -E resource_profile=graph_traversals
>> > west/service-account-name/prod/process_$work_item*
>> > class Profile(Struct):
>> >   queue_name = Required(String)
>> >   resources = Required(Resources)
>> >
>> > HIGH_MEM = Resources(cpu = 8.0, ram = 32 * GB, disk = 64 * GB)
>> > HIGH_CPU = Resources(cpu = 16.0, ram = 4 * GB, disk = 64 * GB)
>> >
>> > work_on_one_item = Process(name = 'work_on_one_item',
>> >   cmdline = '''
>> >     do_work "{{work_item}}"
>> >   ''',
>> > )
>> >
>> > task = Task(processes = [work_on_one_item],
>> >   resources = '{{resources[{{resource_profile}}]}}')
>> >
>> > job = Job(
>> >   task = task,
>> >   cluster = 'west',
>> >   role = 'service-account-name',
>> >   environment = 'prod',
>> >   name = 'process_{{work_item}}',
>> > )
>> >
>> > resources = {
>> >   'graph_traversals': HIGH_MEM,
>> >   'compilations': HIGH_CPU,
>> > }
>> >
>> > jobs = [job.bind(resources = resources)]
>> >
>> >
>> >
>> > On Wed, Feb 26, 2014 at 1:08 PM, Bryan Helmkamp <bryan@codeclimate.com
>> >wrote:
>> >
>> >> Sure. Yes, they are shell commands and yes they are provided different
>> >> configuration on each run.
>> >>
>> >> In effect we have a number of different job types that are queued up,
>> >> and we need to run as quickly as possible. Each job type has different
>> >> resource requirements. Every time we run the job, we provide different
>> >> arguments (the "payload"). For example:
>> >>
>> >> $ ./do_something.sh SOME_ID (Requires 1 CPU and 1GB RAM)
>> >> $ ./do_something_else.sh SOME_OTHER_ID (Requires 4 CPU and 4GB RAM)
>> >> [... there are about 12 of these ...]
>> >>
>> >> -Bryan
>> >>
>> >> On Wed, Feb 26, 2014 at 3:58 PM, Bill Farner <wfarner@apache.org>
>> wrote:
>> >> > Can you offer some more details on what the workload execution looks
>> >> like?
>> >> >  Are these shell commands?  An application that's provided different
>> >> > configuration?
>> >> >
>> >> > -=Bill
>> >> >
>> >> >
>> >> > On Wed, Feb 26, 2014 at 12:45 PM, Bryan Helmkamp <
>> bryan@codeclimate.com
>> >> >wrote:
>> >> >
>> >> >> Thanks, Kevin. The idea of always-on workers of varying sizes is
>> >> >> effectively what we have right now in our non-Mesos world. The
>> problem
>> >> >> is that sometimes we end up with not enough workers for certain
>> >> >> classes of jobs (e.g. High Memory), while part of the cluster sits
>> >> >> idle.
>> >> >>
>> >> >> Conceptually, in my mind we would define approximately a dozen
>> Tasks,
>> >> >> one for each type of work we need to perform (with different
>> resource
>> >> >> requirements), and then run Jobs, each with a Task and a unique
>> >> >> payload, but I don't think this model works with Mesos. It seems
>> we'd
>> >> >> need to create a unique Task for every Job.
>> >> >>
>> >> >> -Bryan
>> >> >>
>> >> >> On Wed, Feb 26, 2014 at 3:35 PM, Kevin Sweeney <kevints@apache.org>
>> >> wrote:
>> >> >> > A job is a group of nearly-identical tasks plus some constraints
>> like
>> >> >> rack
>> >> >> > diversity. The scheduler considers each task within a job
>> equivalently
>> >> >> > schedulable, so you can't vary things like resource footprint.
>> It's
>> >> >> > perfectly fine to have several jobs with just a single task,
as
>> long
>> >> as
>> >> >> > each has a different job key (which is (role, environment,
name)).
>> >> >> >
>> >> >> > Another approach is to have a bunch of uniform always-on workers
>> (in
>> >> >> > different sizes). This can be expressed as a Service like
so:
>> >> >> >
>> >> >> > # workers.aurora
>> >> >> > class Profile(Struct):
>> >> >> >   queue_name = Required(String)
>> >> >> >   resources = Required(Resources)
>> >> >> >   instances = Required(Integer)
>> >> >> >
>> >> >> > HIGH_MEM = Resources(cpu = 8.0, ram = 32 * GB, disk = 64 *
GB)
>> >> >> > HIGH_CPU = Resources(cpu = 16.0, ram = 4 * GB, disk = 64 *
GB)
>> >> >> >
>> >> >> > work_forever = Process(name = 'work_forever',
>> >> >> >   cmdline = '''
>> >> >> >     # TODO: Replace this with something that isn't pseudo-bash
>> >> >> >     while true; do
>> >> >> >       work_item=`take_from_work_queue {{profile.queue_name}}`
>> >> >> >       do_work "$work_item"
>> >> >> >       tell_work_queue_finished "{{profile.queue_name}}"
>> "$work_item"
>> >> >> >     done
>> >> >> >   ''')
>> >> >> >
>> >> >> > task = Task(processes = [work_forever],
>> >> >> > *  resources = '{{profile.resources}}, # Note this is static
per
>> >> >> > queue-name.*
>> >> >> > )
>> >> >> >
>> >> >> > service = Service(
>> >> >> >   task = task,
>> >> >> >   cluster = 'west',
>> >> >> >   role = 'service-account-name',
>> >> >> >   environment = 'prod',
>> >> >> >   name = '{{profile.queue_name}}_processor'
>> >> >> >   *instances = '{{profile.instances}}', # Scale here.*
>> >> >> > )
>> >> >> >
>> >> >> > jobs = [
>> >> >> >   service.bind(profile = Profile(
>> >> >> >     resources = HIGH_MEM,
>> >> >> >     queue_name = 'graph_traversals',
>> >> >> >     instances = 50,
>> >> >> >   )),
>> >> >> >   service.bind(profile = Profile(
>> >> >> >     resources = HIGH_CPU,
>> >> >> >     queue_name = 'compilations',
>> >> >> >     instances = 200,
>> >> >> >   )),
>> >> >> > ]
>> >> >> >
>> >> >> >
>> >> >> > On Wed, Feb 26, 2014 at 11:46 AM, Bryan Helmkamp <
>> >> bryan@codeclimate.com
>> >> >> >wrote:
>> >> >> >
>> >> >> >> Thanks, Bill.
>> >> >> >>
>> >> >> >> Am I correct in understanding that is not possible to
>> parameterize
>> >> >> >> individual Jobs, just Tasks? Therefore, since I don't
know the
>> job
>> >> >> >> definitions up front, I will have parameterized Task templates,
>> and
>> >> >> >> generate a new Task every time I need to run a Job?
>> >> >> >>
>> >> >> >> Is that the recommended route?
>> >> >> >>
>> >> >> >> Our work is very non-uniform so I don't think work-stealing
>> would be
>> >> >> >> efficient for us.
>> >> >> >>
>> >> >> >> -Bryan
>> >> >> >>
>> >> >> >> On Wed, Feb 26, 2014 at 12:49 PM, Bill Farner <
>> wfarner@apache.org>
>> >> >> wrote:
>> >> >> >> > Thanks for checking out Aurora!
>> >> >> >> >
>> >> >> >> > My short answer is that Aurora should handle thousands
of
>> >> short-lived
>> >> >> >> > tasks/jobs per day without trouble.  (If you proceed
with this
>> >> >> approach
>> >> >> >> and
>> >> >> >> > encounter performance issues, feel free to file tickets!)
 The
>> DSL
>> >> >> does
>> >> >> >> > have some mechanisms for parameterization.  In your
case since
>> you
>> >> >> >> probably
>> >> >> >> > don't know all the job definitions upfront, you'll
probably
>> want to
>> >> >> >> > parameterize with environment variables.  I don't
see this
>> >> described
>> >> >> in
>> >> >> >> our
>> >> >> >> > docs, but you there's a little detail at the option
declaration
>> >> [1].
>> >> >> >> >
>> >> >> >> > Another approach worth considering is work-stealing,
using a
>> single
>> >> >> job
>> >> >> >> as
>> >> >> >> > your pool of workers.  I would find this easier to
manage, but
>> it
>> >> >> would
>> >> >> >> > only be suitable if your work items are sufficiently-uniform.
>> >> >> >> >
>> >> >> >> > Feel free to continue the discussion!  We're also
pretty
>> active in
>> >> our
>> >> >> >> IRC
>> >> >> >> > channel if you'd prefer that medium.
>> >> >> >> >
>> >> >> >> >
>> >> >> >> > [1]
>> >> >> >> >
>> >> >> >>
>> >> >>
>> >>
>> https://github.com/apache/incubator-aurora/blob/master/src/main/python/apache/aurora/client/options.py#L170-L183
>> >> >> >> >
>> >> >> >> >
>> >> >> >> > -=Bill
>> >> >> >> >
>> >> >> >> >
>> >> >> >> > On Tue, Feb 25, 2014 at 10:11 PM, Bryan Helmkamp
<
>> >> >> bryan@codeclimate.com
>> >> >> >> >wrote:
>> >> >> >> >
>> >> >> >> >> Hello,
>> >> >> >> >>
>> >> >> >> >> I am considering Aurora for a key component of
our
>> infrastructure.
>> >> >> >> >> Awesome work being done here.
>> >> >> >> >>
>> >> >> >> >> My question is: How suitable is Aurora for running
short-lived
>> >> tasks?
>> >> >> >> >>
>> >> >> >> >> Background: We (Code Climate) do static analysis
of tens of
>> >> thousands
>> >> >> >> >> of repositories every day. We run a variety of
forms of
>> analysis,
>> >> >> with
>> >> >> >> >> heterogeneous resource requirements, and thus
our interest in
>> >> Mesos.
>> >> >> >> >>
>> >> >> >> >> Looking at Aurora, a lot of the core features
look very
>> helpful to
>> >> >> us.
>> >> >> >> >> Where I am getting hung up is figuring out how
to model
>> >> short-lived
>> >> >> >> >> tasks as tasks/jobs. Long-running resource allocations
are not
>> >> really
>> >> >> >> >> an option for us due to the variation in our
workloads.
>> >> >> >> >>
>> >> >> >> >> My first thought was to create a Task for each
type of
>> analysis we
>> >> >> >> >> run, and then start a new Job with the appropriate
Task every
>> >> time we
>> >> >> >> >> want to run analysis (regulated by a queue).
This doesn't
>> seem to
>> >> >> work
>> >> >> >> >> though. I can't `aurora create` the same `.aurora`
file
>> multiple
>> >> >> times
>> >> >> >> >> with different Job names (as far as I can tell).
Also there
>> is the
>> >> >> >> >> problem of how to customize each Job slightly
(e.g. a
>> payload).
>> >> >> >> >>
>> >> >> >> >> An obvious alternative is to create a unique
Task every time
>> we
>> >> want
>> >> >> >> >> to run work. This would result in tens of thousands
of tasks
>> being
>> >> >> >> >> created every day, and from what I can tell Aurora
does not
>> >> intend to
>> >> >> >> >> be used like that. (Please correct me if I am
wrong.)
>> >> >> >> >>
>> >> >> >> >> Basically, I would like to hook my job queue
up to Aurora to
>> >> perform
>> >> >> >> >> the actual work. There are a dozen different
types of jobs,
>> each
>> >> with
>> >> >> >> >> different performance requirements. Every time
a job runs, it
>> has
>> >> a
>> >> >> >> >> unique payload containing the definition of the
work it
>> should be
>> >> >> >> >> performed.
>> >> >> >> >>
>> >> >> >> >> Can Aurora be used this way? If so, what is the
proper way to
>> >> model
>> >> >> >> >> this with respect to Jobs and Tasks?
>> >> >> >> >>
>> >> >> >> >> Any/all help is appreciated.
>> >> >> >> >>
>> >> >> >> >> Thanks!
>> >> >> >> >>
>> >> >> >> >> -Bryan
>> >> >> >> >>
>> >> >> >> >> --
>> >> >> >> >> Bryan Helmkamp, Founder, Code Climate
>> >> >> >> >> bryan@codeclimate.com / 646-379-1810 / @brynary
>> >> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >> --
>> >> >> >> Bryan Helmkamp, Founder, Code Climate
>> >> >> >> bryan@codeclimate.com / 646-379-1810 / @brynary
>> >> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >> --
>> >> >> Bryan Helmkamp, Founder, Code Climate
>> >> >> bryan@codeclimate.com / 646-379-1810 / @brynary
>> >> >>
>> >>
>> >>
>> >>
>> >> --
>> >> Bryan Helmkamp, Founder, Code Climate
>> >> bryan@codeclimate.com / 646-379-1810 / @brynary
>> >>
>>
>>
>>
>> --
>> Bryan Helmkamp, Founder, Code Climate
>> bryan@codeclimate.com / 646-379-1810 / @brynary
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message