aurora-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From san...@apache.org
Subject svn commit: r1799392 [14/14] - in /aurora/site: publish/blog/aurora-0-18-0-released/ publish/documentation/0.18.0/ publish/documentation/0.18.0/additional-resources/ publish/documentation/0.18.0/additional-resources/presentations/ publish/documentation...
Date Wed, 21 Jun 2017 06:36:25 GMT
Added: aurora/site/source/documentation/0.18.0/reference/task-lifecycle.md
URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/0.18.0/reference/task-lifecycle.md?rev=1799392&view=auto
==============================================================================
--- aurora/site/source/documentation/0.18.0/reference/task-lifecycle.md (added)
+++ aurora/site/source/documentation/0.18.0/reference/task-lifecycle.md Wed Jun 21 06:36:21
2017
@@ -0,0 +1,148 @@
+# Task Lifecycle
+
+When Aurora reads a configuration file and finds a `Job` definition, it:
+
+1.  Evaluates the `Job` definition.
+2.  Splits the `Job` into its constituent `Task`s.
+3.  Sends those `Task`s to the scheduler.
+4.  The scheduler puts the `Task`s into `PENDING` state, starting each
+    `Task`'s life cycle.
+
+
+![Life of a task](../images/lifeofatask.png)
+
+Please note, a couple of task states described below are missing from
+this state diagram.
+
+
+## PENDING to RUNNING states
+
+When a `Task` is in the `PENDING` state, the scheduler constantly
+searches for machines satisfying that `Task`'s resource request
+requirements (RAM, disk space, CPU time) while maintaining configuration
+constraints such as "a `Task` must run on machines  dedicated  to a
+particular role" or attribute limit constraints such as "at most 2
+`Task`s from the same `Job` may run on each rack". When the scheduler
+finds a suitable match, it assigns the `Task` to a machine and puts the
+`Task` into the `ASSIGNED` state.
+
+From the `ASSIGNED` state, the scheduler sends an RPC to the agent
+machine containing `Task` configuration, which the agent uses to spawn
+an executor responsible for the `Task`'s lifecycle. When the scheduler
+receives an acknowledgment that the machine has accepted the `Task`,
+the `Task` goes into `STARTING` state.
+
+`STARTING` state initializes a `Task` sandbox. When the sandbox is fully
+initialized, Thermos begins to invoke `Process`es. Also, the agent
+machine sends an update to the scheduler that the `Task` is
+in `RUNNING` state, only after the task satisfies the liveness requirements.
+See [Health Checking](../features/services#health-checking) for more details
+for how to configure health checks.
+
+
+
+## RUNNING to terminal states
+
+There are various ways that an active `Task` can transition into a terminal
+state. By definition, it can never leave this state. However, depending on
+nature of the termination and the originating `Job` definition
+(e.g. `service`, `max_task_failures`), a replacement `Task` might be
+scheduled.
+
+### Natural Termination: FINISHED, FAILED
+
+A `RUNNING` `Task` can terminate without direct user interaction. For
+example, it may be a finite computation that finishes, even something as
+simple as `echo hello world.`, or it could be an exceptional condition in
+a long-lived service. If the `Task` is successful (its underlying
+processes have succeeded with exit status `0` or finished without
+reaching failure limits) it moves into `FINISHED` state. If it finished
+after reaching a set of failure limits, it goes into `FAILED` state.
+
+A terminated `TASK` which is subject to rescheduling will be temporarily
+`THROTTLED`, if it is considered to be flapping. A task is flapping, if its
+previous invocation was terminated after less than 5 minutes (scheduler
+default). The time penalty a task has to remain in the `THROTTLED` state,
+before it is eligible for rescheduling, increases with each consecutive
+failure.
+
+### Forceful Termination: KILLING, RESTARTING
+
+You can terminate a `Task` by issuing an `aurora job kill` command, which
+moves it into `KILLING` state. The scheduler then sends the agent a
+request to terminate the `Task`. If the scheduler receives a successful
+response, it moves the Task into `KILLED` state and never restarts it.
+
+If a `Task` is forced into the `RESTARTING` state via the `aurora job restart`
+command, the scheduler kills the underlying task but in parallel schedules
+an identical replacement for it.
+
+In any case, the responsible executor on the agent follows an escalation
+sequence when killing a running task:
+
+  1. If a `HttpLifecycleConfig` is not present, skip to (4).
+  2. Send a POST to the `graceful_shutdown_endpoint` and wait 5 seconds.
+  3. Send a POST to the `shutdown_endpoint` and wait 5 seconds.
+  4. Send SIGTERM (`kill`) and wait at most `finalization_wait` seconds.
+  5. Send SIGKILL (`kill -9`).
+
+If the executor notices that all `Process`es in a `Task` have aborted
+during this sequence, it will not proceed with subsequent steps.
+Note that graceful shutdown is best-effort, and due to the many
+inevitable realities of distributed systems, it may not be performed.
+
+### Unexpected Termination: LOST
+
+If a `Task` stays in a transient task state for too long (such as `ASSIGNED`
+or `STARTING`), the scheduler forces it into `LOST` state, creating a new
+`Task` in its place that's sent into `PENDING` state.
+
+In addition, if the Mesos core tells the scheduler that a agent has
+become unhealthy (or outright disappeared), the `Task`s assigned to that
+agent go into `LOST` state and new `Task`s are created in their place.
+From `PENDING` state, there is no guarantee a `Task` will be reassigned
+to the same machine unless job constraints explicitly force it there.
+
+### Giving Priority to Production Tasks: PREEMPTING
+
+Sometimes a Task needs to be interrupted, such as when a non-production
+Task's resources are needed by a higher priority production Task. This
+type of interruption is called a *pre-emption*. When this happens in
+Aurora, the non-production Task is killed and moved into
+the `PREEMPTING` state  when both the following are true:
+
+- The task being killed is a non-production task.
+- The other task is a `PENDING` production task that hasn't been
+  scheduled due to a lack of resources.
+
+The scheduler UI shows the non-production task was preempted in favor of
+the production task. At some point, tasks in `PREEMPTING` move to `KILLED`.
+
+Note that non-production tasks consuming many resources are likely to be
+preempted in favor of production tasks.
+
+### Making Room for Maintenance: DRAINING
+
+Cluster operators can set agent into maintenance mode. This will transition
+all `Task` running on this agent into `DRAINING` and eventually to `KILLED`.
+Drained `Task`s will be restarted on other agents for which no maintenance
+has been announced yet.
+
+
+
+## State Reconciliation
+
+Due to the many inevitable realities of distributed systems, there might
+be a mismatch of perceived and actual cluster state (e.g. a machine returns
+from a `netsplit` but the scheduler has already marked all its `Task`s as
+`LOST` and rescheduled them).
+
+Aurora regularly runs a state reconciliation process in order to detect
+and correct such issues (e.g. by killing the errant `RUNNING` tasks).
+By default, the proper detection of all failure scenarios and inconsistencies
+may take up to an hour.
+
+To emphasize this point: there is no uniqueness guarantee for a single
+instance of a job in the presence of network partitions. If the `Task`
+requires that, it should be baked in at the application level using a
+distributed coordination service such as Zookeeper.

Added: aurora/site/source/documentation/latest/operations/troubleshooting.md
URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/operations/troubleshooting.md?rev=1799392&view=auto
==============================================================================
--- aurora/site/source/documentation/latest/operations/troubleshooting.md (added)
+++ aurora/site/source/documentation/latest/operations/troubleshooting.md Wed Jun 21 06:36:21
2017
@@ -0,0 +1,106 @@
+# Troubleshooting
+
+So you've started your first cluster and are running into some issues? We've collected some
common
+stumbling blocks and solutions here to help get you moving.
+
+## Replicated log not initialized
+
+### Symptoms
+- Scheduler RPCs and web interface claim `Storage is not READY`
+- Scheduler log repeatedly prints messages like
+
+  ```
+  I1016 16:12:27.234133 26081 replica.cpp:638] Replica in EMPTY status
+  received a broadcasted recover request
+  I1016 16:12:27.234256 26084 recover.cpp:188] Received a recover response
+  from a replica in EMPTY status
+  ```
+
+### Solution
+When you create a new cluster, you need to inform a quorum of schedulers that they are safe
to
+consider their database to be empty by [initializing](../installation/#finalizing) the
+replicated log. This is done to prevent the scheduler from modifying the cluster state in
the event
+of multiple simultaneous disk failures or, more likely, misconfiguration of the replicated
log path.
+
+
+## No distinct leader elected
+
+### Symptoms
+Either no scheduler or multiple scheduler believe to be leading.
+
+### Solution
+Verify the [network configuration](../configuration/#network-configuration) of the Aurora
+scheduler is correct:
+
+* The `LIBPROCESS_IP:LIBPROCESS_PORT` endpoints must be reachable from all coordinator nodes
running
+  a scheduler or a Mesos master.
+* Hostname lookups have to resolve to public ips rather than local ones that cannot be reached
+  from another node.
+
+In addition, double-check the [quota settings](../configuration/#replicated-log-configuration)
of the
+replicated log.
+
+
+## Scheduler not registered
+
+### Symptoms
+Scheduler log contains
+
+    Framework has not been registered within the tolerated delay.
+
+### Solution
+Double-check that the scheduler is configured correctly to reach the Mesos master. If you
are registering
+the master in ZooKeeper, make sure command line argument to the master:
+
+    --zk=zk://$ZK_HOST:2181/mesos/master
+
+is the same as the one on the scheduler:
+
+    -mesos_master_address=zk://$ZK_HOST:2181/mesos/master
+
+
+## Scheduler not running
+
+### Symptoms
+The scheduler process commits suicide regularly. This happens under error conditions, but
+also on purpose in regular intervals.
+
+### Solution
+Aurora is meant to be run under supervision. You have to configure a supervisor like
+[Monit](http://mmonit.com/monit/), [supervisord](http://supervisord.org/), or systemd to
run the
+scheduler and restart it whenever it fails or exists on purpose.
+
+Aurora supports an active health checking protocol on its admin HTTP interface - if a `GET
/health`
+times out or returns anything other than `200 OK` the scheduler process is unhealthy and
should be
+restarted.
+
+For example, monit can be configured with
+
+    if failed port 8081 send "GET /health HTTP/1.0\r\n" expect "OK\n" with timeout 2 seconds
for 10 cycles then restart
+
+assuming you set `-http_port=8081`.
+
+
+## Executor crashing or hanging
+
+### Symptoms
+Launched task instances never transition to `STARTING` or `RUNNING` but immediately transition
+to `FAILED` or `LOST`.
+
+### Solution
+The executor might be failing due to unknown internal errors such as a missing native dependency
+of the Mesos executor library. Open the Mesos UI and navigate to the failing
+task in question. Inspect the various log files in order to learn about what is going on.
+
+
+## Observer does not discover tasks
+
+### Symptoms
+The observer UI does not list any tasks. When navigating from the scheduler UI to the state
of
+a particular task instance the observer returns `Error: 404 Not Found`.
+
+### Solution
+The observer is refreshing its internal state every couple of seconds. If waiting a few seconds
+does not resolve the issue, check that the `--mesos-root` setting of the observer and the
+`--work_dir` option of the Mesos agent are in sync. For details, see our
+[Install instructions](../installation/#worker-configuration).

Added: aurora/site/source/documentation/latest/operations/upgrades.md
URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/operations/upgrades.md?rev=1799392&view=auto
==============================================================================
--- aurora/site/source/documentation/latest/operations/upgrades.md (added)
+++ aurora/site/source/documentation/latest/operations/upgrades.md Wed Jun 21 06:36:21 2017
@@ -0,0 +1,41 @@
+# Upgrading Aurora
+
+Aurora can be updated from one version to the next without any downtime or restarts of running
+jobs. The same holds true for Mesos.
+
+Generally speaking, Mesos and Aurora strive for a +1/-1 version compatibility, i.e. all components
+are meant to be forward and backwards compatible for at least one version. This implies it
+does not really matter in which order updates are carried out.
+
+Exceptions to this rule are documented in the [Aurora release-notes](../../../RELEASE-NOTES/)
+and the [Mesos upgrade instructions](https://mesos.apache.org/documentation/latest/upgrades/).
+
+
+## Instructions
+
+To upgrade Aurora, follow these steps:
+
+1. Update the first scheduler instance by updating its software and restarting its process.
+2. Wait until the scheduler is up and its [Replicated Log](../configuration/#replicated-log-configuration)
+   caught up with the other schedulers in the cluster. The log has caught up if `log/recovered`
has
+   the value `1`. You can check the metric via `curl LIBPROCESS_IP:LIBPROCESS_PORT/metrics/snapshot`,
+   where ip and port refer to the [libmesos configuration](../configuration/#network-configuration)
+   settings of the scheduler instance.
+3. Proceed with the next scheduler until all instances are updated.
+4. Update the Aurora executor deployed to the compute nodes of your cluster. Jobs will continue
+   running with the old version of the executor, and will only be launched by the new one
once
+   they are restarted eventually due to natural cluster churn.
+5. Distribute the new Aurora client to your users.
+
+
+## Best Practices
+
+Even though not absolutely mandatory, we advice to adhere to the following rules:
+
+* Never skip any major or minor releases when updating. If you have to catch up several releases
you
+  have to deploy all intermediary versions. Skipping bugfix releases is acceptable though.
+* Verify all updates on a test cluster before touching your production deployments.
+* To minimize the number of failovers during updates, update the currently leading scheduler
+  instance last.
+* Update the Aurora executor on a subset of compute nodes as a canary before deploying the
change to
+  the whole fleet.

Added: aurora/site/source/documentation/latest/reference/observer-configuration.md
URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/reference/observer-configuration.md?rev=1799392&view=auto
==============================================================================
--- aurora/site/source/documentation/latest/reference/observer-configuration.md (added)
+++ aurora/site/source/documentation/latest/reference/observer-configuration.md Wed Jun 21
06:36:21 2017
@@ -0,0 +1,89 @@
+# Observer Configuration Reference
+
+The Aurora/Thermos observer can take a variety of configuration options through command-line
arguments.
+A list of the available options can be seen by running `thermos_observer --long-help`.
+
+Please refer to the [Operator Configuration Guide](../../operations/configuration/) for details
on how
+to properly set the most important options.
+
+```
+$ thermos_observer.pex --long-help
+Options:
+  -h, --help, --short-help
+                        show this help message and exit.
+  --long-help           show options from all registered modules, not just the
+                        __main__ module.
+  --mesos-root=MESOS_ROOT
+                        The mesos root directory to search for Thermos
+                        executor sandboxes [default: /var/lib/mesos]
+  --ip=IP               The IP address the observer will bind to. [default:
+                        0.0.0.0]
+  --port=PORT           The port on which the observer should listen.
+                        [default: 1338]
+  --polling_interval_secs=POLLING_INTERVAL_SECS
+                        The number of seconds between observer refresh
+                        attempts. [default: 5]
+  --task_process_collection_interval_secs=TASK_PROCESS_COLLECTION_INTERVAL_SECS
+                        The number of seconds between per task process
+                        resource collections. [default: 20]
+  --task_disk_collection_interval_secs=TASK_DISK_COLLECTION_INTERVAL_SECS
+                        The number of seconds between per task disk resource
+                        collections. [default: 60]
+
+  From module twitter.common.app:
+    --app_daemonize     Daemonize this application. [default: False]
+    --app_profile_output=FILENAME
+                        Dump the profiling output to a binary profiling
+                        format. [default: None]
+    --app_daemon_stderr=TWITTER_COMMON_APP_DAEMON_STDERR
+                        Direct this app's stderr to this file if daemonized.
+                        [default: /dev/null]
+    --app_debug         Print extra debugging information during application
+                        initialization. [default: False]
+    --app_rc_filename   Print the filename for the rc file and quit. [default:
+                        False]
+    --app_daemon_stdout=TWITTER_COMMON_APP_DAEMON_STDOUT
+                        Direct this app's stdout to this file if daemonized.
+                        [default: /dev/null]
+    --app_profiling     Run profiler on the code while it runs.  Note this can
+                        cause slowdowns. [default: False]
+    --app_ignore_rc_file
+                        Ignore default arguments from the rc file. [default:
+                        False]
+    --app_pidfile=TWITTER_COMMON_APP_PIDFILE
+                        The pidfile to use if --app_daemonize is specified.
+                        [default: None]
+
+  From module twitter.common.log.options:
+    --log_to_stdout=[scheme:]LEVEL
+                        OBSOLETE - legacy flag, use --log_to_stderr instead.
+                        [default: ERROR]
+    --log_to_stderr=[scheme:]LEVEL
+                        The level at which logging to stderr [default: ERROR].
+                        Takes either LEVEL or scheme:LEVEL, where LEVEL is one
+                        of ['INFO', 'NONE', 'WARN', 'ERROR', 'DEBUG', 'FATAL']
+                        and scheme is one of ['google', 'plain'].
+    --log_to_disk=[scheme:]LEVEL
+                        The level at which logging to disk [default: INFO].
+                        Takes either LEVEL or scheme:LEVEL, where LEVEL is one
+                        of ['INFO', 'NONE', 'WARN', 'ERROR', 'DEBUG', 'FATAL']
+                        and scheme is one of ['google', 'plain'].
+    --log_dir=DIR       The directory into which log files will be generated
+                        [default: /var/tmp].
+    --log_simple        Write a single log file rather than one log file per
+                        log level [default: False].
+    --log_to_scribe=[scheme:]LEVEL
+                        The level at which logging to scribe [default: NONE].
+                        Takes either LEVEL or scheme:LEVEL, where LEVEL is one
+                        of ['INFO', 'NONE', 'WARN', 'ERROR', 'DEBUG', 'FATAL']
+                        and scheme is one of ['google', 'plain'].
+    --scribe_category=CATEGORY
+                        The category used when logging to the scribe daemon.
+                        [default: python_default].
+    --scribe_buffer     Buffer messages when scribe is unavailable rather than
+                        dropping them. [default: False].
+    --scribe_host=HOST  The host running the scribe daemon. [default:
+                        localhost].
+    --scribe_port=PORT  The port used to connect to the scribe daemon.
+                        [default: 1463].
+```



Mime
View raw message