aurora-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stephan Erb (JIRA)" <>
Subject [jira] [Commented] (AURORA-1258) Improve procedure for adding instances to a job
Date Mon, 04 Jan 2016 17:32:39 GMT


Stephan Erb commented on AURORA-1258:

[~tonydong3] we have implemented a very rudimentary version of a scaling command in a thin
wrapper around the python client (which we install via the aurora python sdist). Maybe this
helps to get the discussion going. The entire feature described by [~yasumoto] would be more

Stripped down/relevant code follows below. The field {{self._api}} is of type {{apache.aurora.client.api.AuroraClientAPI}}.
    def scale_to(self, jobkey, num_instances):
        Scale instance count.

        Be aware:
          * implicit assumptions that all tasks are running with the same task config
          * subject to race conditions when jobs are modified concurrently
            (e.g., kill_job between task config fetch and update)
        query = TaskQuery(jobKeys=[jobkey.to_thrift()], limit=1, statuses=ACTIVE_STATES)
        resp = self._api.query(query)

        if not resp.result.scheduleStatusResult.tasks:
            raise LookupError("Unable to scale job %s. No jobconfig found." % jobkey)

        task_config = resp.result.scheduleStatusResult.tasks[0].assignedTask.task
        self._start_update(jobkey, task_config, num_instances)

    def _start_update(self, jobkey, task_config, num_instances):
        update_settings = UpdaterConfig(**self._update_config).to_thrift_update_settings()
        request = JobUpdateRequest(instanceCount=num_instances, settings=update_settings,
        resp = self._api.scheduler_proxy.startJobUpdate(request, "Scale to %s instances" %

We would happily drop our custom implementation in favor of something more sane. Feel free
to give it a shot :-)

> Improve procedure for adding instances to a job
> -----------------------------------------------
>                 Key: AURORA-1258
>                 URL:
>             Project: Aurora
>          Issue Type: Story
>          Components: Reliability, Usability
>            Reporter: Joe Smith
> The current process for adding instances to a job is highly manual, and potentially dangerous.
> 1. Take a config for a job with 10 instances, update it to 20 instances.
> 2. The batch size will be increased, and users will need to specify shards 10 to 19.
> 3. After this update is complete, users will need to manually update shards 0-9 again.
> There may be other changes pulled in as part of this update other than just increasing
the number of instances, which could further complicate things.
> One possible improvement would be to change the updater from 'under-provision' where
it kills instances first, then schedules new instances, to an 'over-provision' where it adds
on new instances, then backpedals and kills the old instances.
> Overall, a single command or process for a user to take an already-existing job and increase
the number of instances would reduce overhead and fat-fingering.

This message was sent by Atlassian JIRA

View raw message