aurora-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stephan Erb (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (AURORA-1258) Improve procedure for adding instances to a job
Date Mon, 04 Jan 2016 17:32:39 GMT

    [ https://issues.apache.org/jira/browse/AURORA-1258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15081416#comment-15081416
] 

Stephan Erb commented on AURORA-1258:
-------------------------------------

[~tonydong3] we have implemented a very rudimentary version of a scaling command in a thin
wrapper around the python client (which we install via the aurora python sdist). Maybe this
helps to get the discussion going. The entire feature described by [~yasumoto] would be more
involved.

Stripped down/relevant code follows below. The field {{self._api}} is of type {{apache.aurora.client.api.AuroraClientAPI}}.
{code}
    def scale_to(self, jobkey, num_instances):
        """
        Scale instance count.

        Be aware:
          * implicit assumptions that all tasks are running with the same task config
          * subject to race conditions when jobs are modified concurrently
            (e.g., kill_job between task config fetch and update)
        """
        query = TaskQuery(jobKeys=[jobkey.to_thrift()], limit=1, statuses=ACTIVE_STATES)
        resp = self._api.query(query)
        self._validate_response(resp)

        if not resp.result.scheduleStatusResult.tasks:
            raise LookupError("Unable to scale job %s. No jobconfig found." % jobkey)

        task_config = resp.result.scheduleStatusResult.tasks[0].assignedTask.task
        self._start_update(jobkey, task_config, num_instances)

    def _start_update(self, jobkey, task_config, num_instances):
        update_settings = UpdaterConfig(**self._update_config).to_thrift_update_settings()
        request = JobUpdateRequest(instanceCount=num_instances, settings=update_settings,
taskConfig=task_config)
        resp = self._api.scheduler_proxy.startJobUpdate(request, "Scale to %s instances" %
num_instances)
        self._validate_response(resp)
{code}

We would happily drop our custom implementation in favor of something more sane. Feel free
to give it a shot :-)

> Improve procedure for adding instances to a job
> -----------------------------------------------
>
>                 Key: AURORA-1258
>                 URL: https://issues.apache.org/jira/browse/AURORA-1258
>             Project: Aurora
>          Issue Type: Story
>          Components: Reliability, Usability
>            Reporter: Joe Smith
>
> The current process for adding instances to a job is highly manual, and potentially dangerous.
> 1. Take a config for a job with 10 instances, update it to 20 instances.
> 2. The batch size will be increased, and users will need to specify shards 10 to 19.
> 3. After this update is complete, users will need to manually update shards 0-9 again.
> There may be other changes pulled in as part of this update other than just increasing
the number of instances, which could further complicate things.
> One possible improvement would be to change the updater from 'under-provision' where
it kills instances first, then schedules new instances, to an 'over-provision' where it adds
on new instances, then backpedals and kills the old instances.
> Overall, a single command or process for a user to take an already-existing job and increase
the number of instances would reduce overhead and fat-fingering.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message