aurora-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Maxim Khutornenko <ma...@apache.org>
Subject Re: Support instance-specific TaskConfig in CreateJob API
Date Mon, 15 Aug 2016 17:05:37 GMT
I would love to hear more about constraint use cases that don't work across
jobs to see if/how we can extend Aurora to support them.

As far as heterogeneous jobs go, that effort would require rethinking quite
a few assumptions around fundamental Aurora principles to ensure we don't
lock ourselves into the corner wrt future features by accepting an "easy to
do" change short-term. I am -1 on supporting anything specific for
adhoc jobs only. IMO, this has to be an all-or-nothing feature adding
support for heterogeneous jobs across the stack.

If you guys feel strongly about this idea, please craft a high-level design
summary for the community to explore and review.

On Sat, Aug 13, 2016 at 7:43 AM, Mauricio Garavaglia <
mauriciogaravaglia@gmail.com> wrote:

> Hi,
>
> We have been experimenting with the idea of having heterogeneous tasks in a
> job. Mainly to support different docker container configurations (like
> volumes to let tasks have different storage, different labels for logging
> purposes, or ip addresses).
> The main reason for using this instead of separate jobs is that scheduling
> constraints doesn't work across jobs, and we may want to have rack
> anti-affinity for the different instances.
>
> You can check how it works on the README in the repo [
> https://github.com/medallia/aurora/tree/0.13.0-medallia]. Basically the
> job
> includes a list of parameters that are later interpolated in the task
> config during mesos task creation, so this happens at a latter time and the
> different values to apply to each instance are held in the config. We can
> start discussing if you think the design sounds or the feature could be
> helpful and start working to move it upstream.
>
> We used StartJobUpdate to achieve the same purpose but required
> some gymnastics during deployment that we wanted to avoid. Regarding Min
> Cal's issue about short-lived tasks finishing before the update starts, we
> solved it by initially configuring all the tasks with a dummy NOP ("no
> operation") process that just sits there waiting to be updated.
>
> Mauricio
>
>
> On Fri, Aug 12, 2016 at 3:17 PM, Min Cai <mincai@gmail.com> wrote:
>
> > Thanks Maxim. Please see my previous email to David's comments for more
> > detailed response.
> >
> > On Fri, Aug 12, 2016 at 9:24 AM, Maxim Khutornenko <maxim@apache.org>
> > wrote:
> >
> > > I am cautious about merging createJob and startJobUpdate as we don't
> > > support updates of adhoc jobs. It's logically unclear what adhoc job
> > update
> > > would mean as adhoc job instances are not intended to survive terminal
> > > state.
> > >
> >
> > +1. Our adhoc job instances could be short-lived and finished way before
> > StartJobUpdate calls are made to Aurora.
> >
> >
> > >
> > > Even if we decided to do so I am afraid it would not help with the
> > scenario
> > > of creating a new heterogeneous job as the updater only supports a
> single
> > > TaskConfig target.
> > >
> >
> > We will have to make N StartJobUpdate calls to update N distinct task
> > configs so it will be expensive if N is large like > 10K.
> >
> >
> > >
> > > Speaking broadly, Aurora is built around the idea of homogenous jobs.
> > It's
> > > possible to have different task configs to support canaries and update
> > > rolls but we treat that state as *temporary* until config
> reconciliation
> > > completes.
> > >
> >
> > Agreed that the homogeneous jobs are important design consideration for
> > *long-running* jobs like Services. However, most adhoc jobs are
> > heterogenous by nature. For example, they might need to process different
> > input files and write to different output files. Or they might take
> > different parameters etc. It would be nice to extend Aurora to support
> > heterogenous tasks so that it can be used for broader use cases as a
> > meta-scheduler.
> >
> > Thanks, - Min
> >
> >
> > > On Fri, Aug 12, 2016 at 8:03 AM, David McLaughlin <
> > dmclaughlin@apache.org>
> > > wrote:
> > >
> > > > Hi Min,
> > > >
> > > > I'd prefer to add support for ad-hoc jobs to startJobUpdate and
> > > completely
> > > > remove the notion of job create.
> > > >
> > > > " Also, even the
> > > > > StartJobUpdate API is not scalable to a job with 10K ~ 100K task
> > > > instances
> > > > > and each instance has different task config since we will have to
> > > invoke
> > > > > StartJobUpdate for each distinct task config."
> > > >
> > > >
> > > > What is the use case for that? Aurora was designed to have those as
> > > > separate jobs.
> > > >
> > > > Thanks,
> > > > David
> > > >
> > > > On Thu, Aug 11, 2016 at 2:56 PM, Min Cai <mincai@gmail.com> wrote:
> > > >
> > > > > Hey fellow Aurora team:
> > > > >
> > > > > We would like to propose a simple and backwards compatible feature
> in
> > > > > CreateJob API so that we can support instance-specific TaskConfigs.
> > The
> > > > use
> > > > > case here is for an Adhoc job which has different resource settings
> > as
> > > > well
> > > > > as different command line arguments for each task instance. Aurora
> > > today
> > > > > already support heterogenous tasks for the same job via
> > StartJobUpdate
> > > > API,
> > > > > i.e. we can update the job instances to use different task configs.
> > > This
> > > > > works reasonably well for long running tasks like Services.
> However,
> > it
> > > > is
> > > > > not feasible for Adhoc jobs where each task will finish right away
> > > before
> > > > > we even have a chance to invoke StartJobUpdate. Also, even the
> > > > > StartJobUpdate API is not scalable to a job with 10K ~ 100K task
> > > > instances
> > > > > and each instance has different task config since we will have to
> > > invoke
> > > > > StartJobUpdate for each distinct task config.
> > > > >
> > > > > The proposal we have is to add an optional field in
> JobConfiguration
> > > for
> > > > > instance specific task config. It will be override the default task
> > > > config
> > > > > for given instance ID ranges if specific. Otherwise, everything
> will
> > be
> > > > > backwards compatibility as current API. The implementation of this
> > > change
> > > > > also seems to be very simple. We only need to plumb instance
> specific
> > > > tasks
> > > > > configs when we call statemanager.insertPendingTasks in
> > > > > SchedulerThriftInterface.createJob function.
> > > > >
> > > > >  /**
> > > > >   * Description of an Aurora job. One task will be scheduled for
> each
> > > > > instance within the job.
> > > > >   */
> > > > > @@ -328,13 +343,17 @@ struct JobConfiguration {
> > > > >    4: string cronSchedule
> > > > >    /** Collision policy to use when handling overlapping cron runs.
> > > > > Default is KILL_EXISTING. */
> > > > >    5: CronCollisionPolicy cronCollisionPolicy
> > > > > -  /** Task configuration for this job. */
> > > > > +  /** Default task configuration for all instances of this job.
*/
> > > > >    6: TaskConfig taskConfig
> > > > >    /**
> > > > >     * The number of instances in the job. Generated instance IDs
> for
> > > > tasks
> > > > > will be in the range
> > > > >     * [0, instances).
> > > > >     */
> > > > >    8: i32 instanceCount
> > > > > +  /**
> > > > > +   * The instance specific task configs that override the default
> > task
> > > > > config for given
> > > > > +   * instanceId ranges.
> > > > > +   */
> > > > > +  10: optional set<InstanceTaskConfig> instanceTaskConfigs
> > > > >  }
> > > > >
> > > > > Please let us know your comments and suggestions.
> > > > >
> > > > > Thanks, - Min
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message