aurora-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bill Farner <wfar...@apache.org>
Subject Re: Suitibility of Aurora for one-time tasks
Date Thu, 27 Feb 2014 04:18:07 GMT
On Wed, Feb 26, 2014 at 7:45 PM, Bryan Helmkamp <bryan@codeclimate.com>wrote:

> Got it. Thanks. Do finished Jobs and Tasks get garbage collected
> automatically at some point?


> Otherwise it seems like they will stack up pretty fast. (We might run
> hundreds of thousands of jobs in a day.)
>

Jobs are garbage-collected after a configurable period of inactivity.  This
is tuned on the scheduler with the command line arg history_prune_threshold,
default is currently 2 days.


>
> BTW, Aurora does not seem to like the resources =
> '{{resources[{{resource_profile}}]}}' part. I tried to fix it, but
> keep getting:
>
>     InvalidConfigError: Expected dictionary argument, got
> '{{resources[{{resource_profile}}]}}'
>

Kevin -- does the DSL support nested interpolation?  Either way, maybe you
meant this:

task = Task(processes = [work_on_one_item],
  resources = '{{resources[{{work_item}}]}}')


>
> (For now I'm using a different .aurora file for each resource
> configuration.)
>
> Best,
>
> -Bryan
>
> On Wed, Feb 26, 2014 at 9:04 PM, Kevin Sweeney <kevints@apache.org> wrote:
> > And after a bit of code spelunking the semantics you want already exist
> > (just undocumented). Updated the ticket to update the documentation.
> >
> >
> > On Wed, Feb 26, 2014 at 6:00 PM, Kevin Sweeney <kevints@apache.org>
> wrote:
> >
> >> The example I gave is somewhat syntactically invalid due to coding via
> >> email, but that's more or less what the interface will look like. I also
> >> filed https://issues.apache.org/jira/browse/AURORA-236 for more
> >> first-class support of the semantics I think you want (though currently
> you
> >> can fake it by setting max_failures to a very high number).
> >>
> >>
> >> On Wed, Feb 26, 2014 at 5:33 PM, Bryan Helmkamp <bryan@codeclimate.com
> >wrote:
> >>
> >>> Thanks, Kevin. That pretty much looks like exactly what I need.
> >>>
> >>> -Bryan
> >>>
> >>> On Wed, Feb 26, 2014 at 8:16 PM, Kevin Sweeney <kevints@apache.org>
> >>> wrote:
> >>> > For a more dynamic approach to resource utilization you can use
> >>> something
> >>> > like this:
> >>> >
> >>> > # dynamic.aurora
> >>> > *# Enqueue each individual work-item with aurora create -E
> >>> > work_item=$work_item -E resource_profile=graph_traversals
> >>> > west/service-account-name/prod/process_$work_item*
> >>> > class Profile(Struct):
> >>> >   queue_name = Required(String)
> >>> >   resources = Required(Resources)
> >>> >
> >>> > HIGH_MEM = Resources(cpu = 8.0, ram = 32 * GB, disk = 64 * GB)
> >>> > HIGH_CPU = Resources(cpu = 16.0, ram = 4 * GB, disk = 64 * GB)
> >>> >
> >>> > work_on_one_item = Process(name = 'work_on_one_item',
> >>> >   cmdline = '''
> >>> >     do_work "{{work_item}}"
> >>> >   ''',
> >>> > )
> >>> >
> >>> > task = Task(processes = [work_on_one_item],
> >>> >   resources = '{{resources[{{resource_profile}}]}}')
> >>> >
> >>> > job = Job(
> >>> >   task = task,
> >>> >   cluster = 'west',
> >>> >   role = 'service-account-name',
> >>> >   environment = 'prod',
> >>> >   name = 'process_{{work_item}}',
> >>> > )
> >>> >
> >>> > resources = {
> >>> >   'graph_traversals': HIGH_MEM,
> >>> >   'compilations': HIGH_CPU,
> >>> > }
> >>> >
> >>> > jobs = [job.bind(resources = resources)]
> >>> >
> >>> >
> >>> >
> >>> > On Wed, Feb 26, 2014 at 1:08 PM, Bryan Helmkamp <
> bryan@codeclimate.com
> >>> >wrote:
> >>> >
> >>> >> Sure. Yes, they are shell commands and yes they are provided
> different
> >>> >> configuration on each run.
> >>> >>
> >>> >> In effect we have a number of different job types that are queued
> up,
> >>> >> and we need to run as quickly as possible. Each job type has
> different
> >>> >> resource requirements. Every time we run the job, we provide
> different
> >>> >> arguments (the "payload"). For example:
> >>> >>
> >>> >> $ ./do_something.sh SOME_ID (Requires 1 CPU and 1GB RAM)
> >>> >> $ ./do_something_else.sh SOME_OTHER_ID (Requires 4 CPU and 4GB
RAM)
> >>> >> [... there are about 12 of these ...]
> >>> >>
> >>> >> -Bryan
> >>> >>
> >>> >> On Wed, Feb 26, 2014 at 3:58 PM, Bill Farner <wfarner@apache.org>
> >>> wrote:
> >>> >> > Can you offer some more details on what the workload execution
> looks
> >>> >> like?
> >>> >> >  Are these shell commands?  An application that's provided
> different
> >>> >> > configuration?
> >>> >> >
> >>> >> > -=Bill
> >>> >> >
> >>> >> >
> >>> >> > On Wed, Feb 26, 2014 at 12:45 PM, Bryan Helmkamp <
> >>> bryan@codeclimate.com
> >>> >> >wrote:
> >>> >> >
> >>> >> >> Thanks, Kevin. The idea of always-on workers of varying
sizes is
> >>> >> >> effectively what we have right now in our non-Mesos world.
The
> >>> problem
> >>> >> >> is that sometimes we end up with not enough workers for
certain
> >>> >> >> classes of jobs (e.g. High Memory), while part of the
cluster
> sits
> >>> >> >> idle.
> >>> >> >>
> >>> >> >> Conceptually, in my mind we would define approximately
a dozen
> >>> Tasks,
> >>> >> >> one for each type of work we need to perform (with different
> >>> resource
> >>> >> >> requirements), and then run Jobs, each with a Task and
a unique
> >>> >> >> payload, but I don't think this model works with Mesos.
It seems
> >>> we'd
> >>> >> >> need to create a unique Task for every Job.
> >>> >> >>
> >>> >> >> -Bryan
> >>> >> >>
> >>> >> >> On Wed, Feb 26, 2014 at 3:35 PM, Kevin Sweeney <
> kevints@apache.org>
> >>> >> wrote:
> >>> >> >> > A job is a group of nearly-identical tasks plus some
> constraints
> >>> like
> >>> >> >> rack
> >>> >> >> > diversity. The scheduler considers each task within
a job
> >>> equivalently
> >>> >> >> > schedulable, so you can't vary things like resource
footprint.
> >>> It's
> >>> >> >> > perfectly fine to have several jobs with just a single
task, as
> >>> long
> >>> >> as
> >>> >> >> > each has a different job key (which is (role, environment,
> name)).
> >>> >> >> >
> >>> >> >> > Another approach is to have a bunch of uniform always-on
> workers
> >>> (in
> >>> >> >> > different sizes). This can be expressed as a Service
like so:
> >>> >> >> >
> >>> >> >> > # workers.aurora
> >>> >> >> > class Profile(Struct):
> >>> >> >> >   queue_name = Required(String)
> >>> >> >> >   resources = Required(Resources)
> >>> >> >> >   instances = Required(Integer)
> >>> >> >> >
> >>> >> >> > HIGH_MEM = Resources(cpu = 8.0, ram = 32 * GB, disk
= 64 * GB)
> >>> >> >> > HIGH_CPU = Resources(cpu = 16.0, ram = 4 * GB, disk
= 64 * GB)
> >>> >> >> >
> >>> >> >> > work_forever = Process(name = 'work_forever',
> >>> >> >> >   cmdline = '''
> >>> >> >> >     # TODO: Replace this with something that isn't
pseudo-bash
> >>> >> >> >     while true; do
> >>> >> >> >       work_item=`take_from_work_queue {{profile.queue_name}}`
> >>> >> >> >       do_work "$work_item"
> >>> >> >> >       tell_work_queue_finished "{{profile.queue_name}}"
> >>> "$work_item"
> >>> >> >> >     done
> >>> >> >> >   ''')
> >>> >> >> >
> >>> >> >> > task = Task(processes = [work_forever],
> >>> >> >> > *  resources = '{{profile.resources}}, # Note this
is static
> per
> >>> >> >> > queue-name.*
> >>> >> >> > )
> >>> >> >> >
> >>> >> >> > service = Service(
> >>> >> >> >   task = task,
> >>> >> >> >   cluster = 'west',
> >>> >> >> >   role = 'service-account-name',
> >>> >> >> >   environment = 'prod',
> >>> >> >> >   name = '{{profile.queue_name}}_processor'
> >>> >> >> >   *instances = '{{profile.instances}}', # Scale here.*
> >>> >> >> > )
> >>> >> >> >
> >>> >> >> > jobs = [
> >>> >> >> >   service.bind(profile = Profile(
> >>> >> >> >     resources = HIGH_MEM,
> >>> >> >> >     queue_name = 'graph_traversals',
> >>> >> >> >     instances = 50,
> >>> >> >> >   )),
> >>> >> >> >   service.bind(profile = Profile(
> >>> >> >> >     resources = HIGH_CPU,
> >>> >> >> >     queue_name = 'compilations',
> >>> >> >> >     instances = 200,
> >>> >> >> >   )),
> >>> >> >> > ]
> >>> >> >> >
> >>> >> >> >
> >>> >> >> > On Wed, Feb 26, 2014 at 11:46 AM, Bryan Helmkamp
<
> >>> >> bryan@codeclimate.com
> >>> >> >> >wrote:
> >>> >> >> >
> >>> >> >> >> Thanks, Bill.
> >>> >> >> >>
> >>> >> >> >> Am I correct in understanding that is not possible
to
> >>> parameterize
> >>> >> >> >> individual Jobs, just Tasks? Therefore, since
I don't know the
> >>> job
> >>> >> >> >> definitions up front, I will have parameterized
Task
> templates,
> >>> and
> >>> >> >> >> generate a new Task every time I need to run
a Job?
> >>> >> >> >>
> >>> >> >> >> Is that the recommended route?
> >>> >> >> >>
> >>> >> >> >> Our work is very non-uniform so I don't think
work-stealing
> >>> would be
> >>> >> >> >> efficient for us.
> >>> >> >> >>
> >>> >> >> >> -Bryan
> >>> >> >> >>
> >>> >> >> >> On Wed, Feb 26, 2014 at 12:49 PM, Bill Farner
<
> >>> wfarner@apache.org>
> >>> >> >> wrote:
> >>> >> >> >> > Thanks for checking out Aurora!
> >>> >> >> >> >
> >>> >> >> >> > My short answer is that Aurora should handle
thousands of
> >>> >> short-lived
> >>> >> >> >> > tasks/jobs per day without trouble.  (If
you proceed with
> this
> >>> >> >> approach
> >>> >> >> >> and
> >>> >> >> >> > encounter performance issues, feel free
to file tickets!)
>  The
> >>> DSL
> >>> >> >> does
> >>> >> >> >> > have some mechanisms for parameterization.
 In your case
> since
> >>> you
> >>> >> >> >> probably
> >>> >> >> >> > don't know all the job definitions upfront,
you'll probably
> >>> want to
> >>> >> >> >> > parameterize with environment variables.
 I don't see this
> >>> >> described
> >>> >> >> in
> >>> >> >> >> our
> >>> >> >> >> > docs, but you there's a little detail at
the option
> declaration
> >>> >> [1].
> >>> >> >> >> >
> >>> >> >> >> > Another approach worth considering is work-stealing,
using a
> >>> single
> >>> >> >> job
> >>> >> >> >> as
> >>> >> >> >> > your pool of workers.  I would find this
easier to manage,
> but
> >>> it
> >>> >> >> would
> >>> >> >> >> > only be suitable if your work items are
> sufficiently-uniform.
> >>> >> >> >> >
> >>> >> >> >> > Feel free to continue the discussion!  We're
also pretty
> >>> active in
> >>> >> our
> >>> >> >> >> IRC
> >>> >> >> >> > channel if you'd prefer that medium.
> >>> >> >> >> >
> >>> >> >> >> >
> >>> >> >> >> > [1]
> >>> >> >> >> >
> >>> >> >> >>
> >>> >> >>
> >>> >>
> >>>
> https://github.com/apache/incubator-aurora/blob/master/src/main/python/apache/aurora/client/options.py#L170-L183
> >>> >> >> >> >
> >>> >> >> >> >
> >>> >> >> >> > -=Bill
> >>> >> >> >> >
> >>> >> >> >> >
> >>> >> >> >> > On Tue, Feb 25, 2014 at 10:11 PM, Bryan
Helmkamp <
> >>> >> >> bryan@codeclimate.com
> >>> >> >> >> >wrote:
> >>> >> >> >> >
> >>> >> >> >> >> Hello,
> >>> >> >> >> >>
> >>> >> >> >> >> I am considering Aurora for a key component
of our
> >>> infrastructure.
> >>> >> >> >> >> Awesome work being done here.
> >>> >> >> >> >>
> >>> >> >> >> >> My question is: How suitable is Aurora
for running
> short-lived
> >>> >> tasks?
> >>> >> >> >> >>
> >>> >> >> >> >> Background: We (Code Climate) do static
analysis of tens of
> >>> >> thousands
> >>> >> >> >> >> of repositories every day. We run a
variety of forms of
> >>> analysis,
> >>> >> >> with
> >>> >> >> >> >> heterogeneous resource requirements,
and thus our interest
> in
> >>> >> Mesos.
> >>> >> >> >> >>
> >>> >> >> >> >> Looking at Aurora, a lot of the core
features look very
> >>> helpful to
> >>> >> >> us.
> >>> >> >> >> >> Where I am getting hung up is figuring
out how to model
> >>> >> short-lived
> >>> >> >> >> >> tasks as tasks/jobs. Long-running resource
allocations are
> not
> >>> >> really
> >>> >> >> >> >> an option for us due to the variation
in our workloads.
> >>> >> >> >> >>
> >>> >> >> >> >> My first thought was to create a Task
for each type of
> >>> analysis we
> >>> >> >> >> >> run, and then start a new Job with the
appropriate Task
> every
> >>> >> time we
> >>> >> >> >> >> want to run analysis (regulated by a
queue). This doesn't
> >>> seem to
> >>> >> >> work
> >>> >> >> >> >> though. I can't `aurora create` the
same `.aurora` file
> >>> multiple
> >>> >> >> times
> >>> >> >> >> >> with different Job names (as far as
I can tell). Also there
> >>> is the
> >>> >> >> >> >> problem of how to customize each Job
slightly (e.g. a
> >>> payload).
> >>> >> >> >> >>
> >>> >> >> >> >> An obvious alternative is to create
a unique Task every
> time
> >>> we
> >>> >> want
> >>> >> >> >> >> to run work. This would result in tens
of thousands of
> tasks
> >>> being
> >>> >> >> >> >> created every day, and from what I can
tell Aurora does not
> >>> >> intend to
> >>> >> >> >> >> be used like that. (Please correct me
if I am wrong.)
> >>> >> >> >> >>
> >>> >> >> >> >> Basically, I would like to hook my job
queue up to Aurora
> to
> >>> >> perform
> >>> >> >> >> >> the actual work. There are a dozen different
types of jobs,
> >>> each
> >>> >> with
> >>> >> >> >> >> different performance requirements.
Every time a job runs,
> it
> >>> has
> >>> >> a
> >>> >> >> >> >> unique payload containing the definition
of the work it
> >>> should be
> >>> >> >> >> >> performed.
> >>> >> >> >> >>
> >>> >> >> >> >> Can Aurora be used this way? If so,
what is the proper way
> to
> >>> >> model
> >>> >> >> >> >> this with respect to Jobs and Tasks?
> >>> >> >> >> >>
> >>> >> >> >> >> Any/all help is appreciated.
> >>> >> >> >> >>
> >>> >> >> >> >> Thanks!
> >>> >> >> >> >>
> >>> >> >> >> >> -Bryan
> >>> >> >> >> >>
> >>> >> >> >> >> --
> >>> >> >> >> >> Bryan Helmkamp, Founder, Code Climate
> >>> >> >> >> >> bryan@codeclimate.com / 646-379-1810
/ @brynary
> >>> >> >> >> >>
> >>> >> >> >>
> >>> >> >> >>
> >>> >> >> >>
> >>> >> >> >> --
> >>> >> >> >> Bryan Helmkamp, Founder, Code Climate
> >>> >> >> >> bryan@codeclimate.com / 646-379-1810 / @brynary
> >>> >> >> >>
> >>> >> >>
> >>> >> >>
> >>> >> >>
> >>> >> >> --
> >>> >> >> Bryan Helmkamp, Founder, Code Climate
> >>> >> >> bryan@codeclimate.com / 646-379-1810 / @brynary
> >>> >> >>
> >>> >>
> >>> >>
> >>> >>
> >>> >> --
> >>> >> Bryan Helmkamp, Founder, Code Climate
> >>> >> bryan@codeclimate.com / 646-379-1810 / @brynary
> >>> >>
> >>>
> >>>
> >>>
> >>> --
> >>> Bryan Helmkamp, Founder, Code Climate
> >>> bryan@codeclimate.com / 646-379-1810 / @brynary
> >>>
> >>
> >>
>
>
>
> --
> Bryan Helmkamp, Founder, Code Climate
> bryan@codeclimate.com / 646-379-1810 / @brynary
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message