aurora-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mauricio Garavaglia <mauriciogaravag...@gmail.com>
Subject Re: Support instance-specific TaskConfig in CreateJob API
Date Sat, 13 Aug 2016 14:43:02 GMT
Hi,

We have been experimenting with the idea of having heterogeneous tasks in a
job. Mainly to support different docker container configurations (like
volumes to let tasks have different storage, different labels for logging
purposes, or ip addresses).
The main reason for using this instead of separate jobs is that scheduling
constraints doesn't work across jobs, and we may want to have rack
anti-affinity for the different instances.

You can check how it works on the README in the repo [
https://github.com/medallia/aurora/tree/0.13.0-medallia]. Basically the job
includes a list of parameters that are later interpolated in the task
config during mesos task creation, so this happens at a latter time and the
different values to apply to each instance are held in the config. We can
start discussing if you think the design sounds or the feature could be
helpful and start working to move it upstream.

We used StartJobUpdate to achieve the same purpose but required
some gymnastics during deployment that we wanted to avoid. Regarding Min
Cal's issue about short-lived tasks finishing before the update starts, we
solved it by initially configuring all the tasks with a dummy NOP ("no
operation") process that just sits there waiting to be updated.

Mauricio


On Fri, Aug 12, 2016 at 3:17 PM, Min Cai <mincai@gmail.com> wrote:

> Thanks Maxim. Please see my previous email to David's comments for more
> detailed response.
>
> On Fri, Aug 12, 2016 at 9:24 AM, Maxim Khutornenko <maxim@apache.org>
> wrote:
>
> > I am cautious about merging createJob and startJobUpdate as we don't
> > support updates of adhoc jobs. It's logically unclear what adhoc job
> update
> > would mean as adhoc job instances are not intended to survive terminal
> > state.
> >
>
> +1. Our adhoc job instances could be short-lived and finished way before
> StartJobUpdate calls are made to Aurora.
>
>
> >
> > Even if we decided to do so I am afraid it would not help with the
> scenario
> > of creating a new heterogeneous job as the updater only supports a single
> > TaskConfig target.
> >
>
> We will have to make N StartJobUpdate calls to update N distinct task
> configs so it will be expensive if N is large like > 10K.
>
>
> >
> > Speaking broadly, Aurora is built around the idea of homogenous jobs.
> It's
> > possible to have different task configs to support canaries and update
> > rolls but we treat that state as *temporary* until config reconciliation
> > completes.
> >
>
> Agreed that the homogeneous jobs are important design consideration for
> *long-running* jobs like Services. However, most adhoc jobs are
> heterogenous by nature. For example, they might need to process different
> input files and write to different output files. Or they might take
> different parameters etc. It would be nice to extend Aurora to support
> heterogenous tasks so that it can be used for broader use cases as a
> meta-scheduler.
>
> Thanks, - Min
>
>
> > On Fri, Aug 12, 2016 at 8:03 AM, David McLaughlin <
> dmclaughlin@apache.org>
> > wrote:
> >
> > > Hi Min,
> > >
> > > I'd prefer to add support for ad-hoc jobs to startJobUpdate and
> > completely
> > > remove the notion of job create.
> > >
> > > " Also, even the
> > > > StartJobUpdate API is not scalable to a job with 10K ~ 100K task
> > > instances
> > > > and each instance has different task config since we will have to
> > invoke
> > > > StartJobUpdate for each distinct task config."
> > >
> > >
> > > What is the use case for that? Aurora was designed to have those as
> > > separate jobs.
> > >
> > > Thanks,
> > > David
> > >
> > > On Thu, Aug 11, 2016 at 2:56 PM, Min Cai <mincai@gmail.com> wrote:
> > >
> > > > Hey fellow Aurora team:
> > > >
> > > > We would like to propose a simple and backwards compatible feature in
> > > > CreateJob API so that we can support instance-specific TaskConfigs.
> The
> > > use
> > > > case here is for an Adhoc job which has different resource settings
> as
> > > well
> > > > as different command line arguments for each task instance. Aurora
> > today
> > > > already support heterogenous tasks for the same job via
> StartJobUpdate
> > > API,
> > > > i.e. we can update the job instances to use different task configs.
> > This
> > > > works reasonably well for long running tasks like Services. However,
> it
> > > is
> > > > not feasible for Adhoc jobs where each task will finish right away
> > before
> > > > we even have a chance to invoke StartJobUpdate. Also, even the
> > > > StartJobUpdate API is not scalable to a job with 10K ~ 100K task
> > > instances
> > > > and each instance has different task config since we will have to
> > invoke
> > > > StartJobUpdate for each distinct task config.
> > > >
> > > > The proposal we have is to add an optional field in JobConfiguration
> > for
> > > > instance specific task config. It will be override the default task
> > > config
> > > > for given instance ID ranges if specific. Otherwise, everything will
> be
> > > > backwards compatibility as current API. The implementation of this
> > change
> > > > also seems to be very simple. We only need to plumb instance specific
> > > tasks
> > > > configs when we call statemanager.insertPendingTasks in
> > > > SchedulerThriftInterface.createJob function.
> > > >
> > > >  /**
> > > >   * Description of an Aurora job. One task will be scheduled for each
> > > > instance within the job.
> > > >   */
> > > > @@ -328,13 +343,17 @@ struct JobConfiguration {
> > > >    4: string cronSchedule
> > > >    /** Collision policy to use when handling overlapping cron runs.
> > > > Default is KILL_EXISTING. */
> > > >    5: CronCollisionPolicy cronCollisionPolicy
> > > > -  /** Task configuration for this job. */
> > > > +  /** Default task configuration for all instances of this job. */
> > > >    6: TaskConfig taskConfig
> > > >    /**
> > > >     * The number of instances in the job. Generated instance IDs for
> > > tasks
> > > > will be in the range
> > > >     * [0, instances).
> > > >     */
> > > >    8: i32 instanceCount
> > > > +  /**
> > > > +   * The instance specific task configs that override the default
> task
> > > > config for given
> > > > +   * instanceId ranges.
> > > > +   */
> > > > +  10: optional set<InstanceTaskConfig> instanceTaskConfigs
> > > >  }
> > > >
> > > > Please let us know your comments and suggestions.
> > > >
> > > > Thanks, - Min
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message