Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 12CD7200B33 for ; Tue, 14 Jun 2016 23:35:48 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 11463160A06; Tue, 14 Jun 2016 21:35:48 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 1B391160A47 for ; Tue, 14 Jun 2016 23:35:45 +0200 (CEST) Received: (qmail 17808 invoked by uid 500); 14 Jun 2016 21:35:45 -0000 Mailing-List: contact commits-help@aurora.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@aurora.apache.org Delivered-To: mailing list commits@aurora.apache.org Received: (qmail 17790 invoked by uid 99); 14 Jun 2016 21:35:45 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 14 Jun 2016 21:35:45 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id B0412C0727 for ; Tue, 14 Jun 2016 21:35:44 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.376 X-Spam-Level: X-Spam-Status: No, score=0.376 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, KAM_LAZY_DOMAIN_SECURITY=1, NORMAL_HTTP_TO_IP=0.001, RP_MATCHES_RCVD=-1.426, WEIRD_PORT=0.001] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id fz-rk5MS27gx for ; Tue, 14 Jun 2016 21:35:40 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id F16BF5FD2F for ; Tue, 14 Jun 2016 21:35:35 +0000 (UTC) Received: from svn01-us-west.apache.org (svn.apache.org [10.41.0.6]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id C72AAE5E74 for ; Tue, 14 Jun 2016 21:35:33 +0000 (UTC) Received: from svn01-us-west.apache.org (localhost [127.0.0.1]) by svn01-us-west.apache.org (ASF Mail Server at svn01-us-west.apache.org) with ESMTP id C54F33A0096 for ; Tue, 14 Jun 2016 21:35:33 +0000 (UTC) Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Subject: svn commit: r1748470 [19/19] - in /aurora/site: data/ publish/ publish/blog/ publish/blog/aurora-0-14-0-released/ publish/documentation/0.10.0/ publish/documentation/0.10.0/build-system/ publish/documentation/0.10.0/client-cluster-configuration/ publis... Date: Tue, 14 Jun 2016 21:35:30 -0000 To: commits@aurora.apache.org From: serb@apache.org X-Mailer: svnmailer-1.0.9 Message-Id: <20160614213533.C54F33A0096@svn01-us-west.apache.org> archived-at: Tue, 14 Jun 2016 21:35:48 -0000 Modified: aurora/site/source/documentation/latest/development/db-migration.md URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/development/db-migration.md?rev=1748470&r1=1748469&r2=1748470&view=diff ============================================================================== --- aurora/site/source/documentation/latest/development/db-migration.md (original) +++ aurora/site/source/documentation/latest/development/db-migration.md Tue Jun 14 21:35:25 2016 @@ -10,10 +10,11 @@ a snapshot is restored, no manual intera Upgrades -------- -When adding or altering tables or changing data, a new migration class should be created under the -org.apache.aurora.scheduler.storage.db.migration package. The class should implement the -[MigrationScript](https://github.com/mybatis/migrations/blob/master/src/main/java/org/apache/ibatis/migration/MigrationScript.java) -interface (see [V001_TestMigration](https://github.com/apache/aurora/blob/rel/0.13.0/src/test/java/org/apache/aurora/scheduler/storage/db/testmigration/V001_TestMigration.java) +When adding or altering tables or changing data, in addition to making to change in +[schema.sql](../../src/main/resources/org/apache/aurora/scheduler/storage/db/schema.sql), a new +migration class should be created under the org.apache.aurora.scheduler.storage.db.migration +package. The class should implement the [MigrationScript](https://github.com/mybatis/migrations/blob/master/src/main/java/org/apache/ibatis/migration/MigrationScript.java) +interface (see [V001_TestMigration](https://github.com/apache/aurora/blob/rel/0.14.0/src/test/java/org/apache/aurora/scheduler/storage/db/testmigration/V001_TestMigration.java) as an example). The upgrade and downgrade scripts are defined in this class. When restoring a snapshot the list of migrations on the classpath is compared to the list of applied changes in the DB. Any changes that have not yet been applied are executed and their downgrade script is stored @@ -28,6 +29,6 @@ applied. Baselines --------- After enough time has passed (at least 1 official release), it should be safe to baseline migrations -if desired. This can be accomplished by adding the changes from migrations directly to -[schema.sql](../../src/main/resources/org/apache/aurora/scheduler/storage/db/schema.sql), removing -the corresponding migration classes and adding a migration to remove the changelog entries. \ No newline at end of file +if desired. This can be accomplished by ensuring the changes from migrations have been applied to +[schema.sql](../../src/main/resources/org/apache/aurora/scheduler/storage/db/schema.sql) and then +removing the corresponding migration classes and adding a migration to remove the changelog entries. \ No newline at end of file Modified: aurora/site/source/documentation/latest/development/design-documents.md URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/development/design-documents.md?rev=1748470&r1=1748469&r2=1748470&view=diff ============================================================================== --- aurora/site/source/documentation/latest/development/design-documents.md (original) +++ aurora/site/source/documentation/latest/development/design-documents.md Tue Jun 14 21:35:25 2016 @@ -9,6 +9,7 @@ in the proposed form. Current and past documents: * [Command Hooks for the Aurora Client](../design/command-hooks/) +* [GPU Resources in Aurora](https://docs.google.com/document/d/1J9SIswRMpVKQpnlvJAMAJtKfPP7ZARFknuyXl-2aZ-M/edit) * [Health Checks for Updates](https://docs.google.com/document/d/1ZdgW8S4xMhvKW7iQUX99xZm10NXSxEWR0a-21FP5d94/edit) * [JobUpdateDiff thrift API](https://docs.google.com/document/d/1Fc_YhhV7fc4D9Xv6gJzpfooxbK4YWZcvzw6Bd3qVTL8/edit) * [REST API RFC](https://docs.google.com/document/d/11_lAsYIRlD5ETRzF2eSd3oa8LXAHYFD8rSetspYXaf4/edit) Modified: aurora/site/source/documentation/latest/development/thrift.md URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/development/thrift.md?rev=1748470&r1=1748469&r2=1748470&view=diff ============================================================================== --- aurora/site/source/documentation/latest/development/thrift.md (original) +++ aurora/site/source/documentation/latest/development/thrift.md Tue Jun 14 21:35:25 2016 @@ -6,7 +6,7 @@ client/server RPC protocol as well as fo correctly handling additions and renames of the existing members, field removals must be done carefully to ensure backwards compatibility and provide predictable deprecation cycle. This document describes general guidelines for making Thrift schema changes to the existing fields in -[api.thrift](https://github.com/apache/aurora/blob/rel/0.13.0/api/src/main/thrift/org/apache/aurora/gen/api.thrift). +[api.thrift](https://github.com/apache/aurora/blob/rel/0.14.0/api/src/main/thrift/org/apache/aurora/gen/api.thrift). It is highly recommended to go through the [Thrift: The Missing Guide](http://diwakergupta.github.io/thrift-missing-guide/) first to refresh on @@ -33,7 +33,7 @@ communicate with scheduler/client from v * Add a new field as an eventual replacement of the old one and implement a dual read/write anywhere the old field is used. If a thrift struct is mapped in the DB store make sure both columns are marked as `NOT NULL` -* Check [storage.thrift](https://github.com/apache/aurora/blob/rel/0.13.0/api/src/main/thrift/org/apache/aurora/gen/storage.thrift) to see if +* Check [storage.thrift](https://github.com/apache/aurora/blob/rel/0.14.0/api/src/main/thrift/org/apache/aurora/gen/storage.thrift) to see if the affected struct is stored in Aurora scheduler storage. If so, it's almost certainly also necessary to perform a [DB migration](../db-migration/). * Add a deprecation jira ticket into the vCurrent+1 release candidate Modified: aurora/site/source/documentation/latest/features/constraints.md URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/features/constraints.md?rev=1748470&r1=1748469&r2=1748470&view=diff ============================================================================== --- aurora/site/source/documentation/latest/features/constraints.md (original) +++ aurora/site/source/documentation/latest/features/constraints.md Tue Jun 14 21:35:25 2016 @@ -1,7 +1,7 @@ Scheduling Constraints ====================== -By default, Aurora will pick any random slave with sufficient resources +By default, Aurora will pick any random agent with sufficient resources in order to schedule a task. This scheduling choice can be further restricted with the help of constraints. @@ -11,10 +11,10 @@ Mesos Attributes Data centers are often organized with hierarchical failure domains. Common failure domains include hosts, racks, rows, and PDUs. If you have this information available, it is wise to tag -the Mesos slave with them as +the Mesos agent with them as [attributes](https://mesos.apache.org/documentation/attributes-resources/). -The Mesos slave `--attributes` command line argument can be used to mark slaves with +The Mesos agent `--attributes` command line argument can be used to mark agents with static key/value pairs, so called attributes (not to be confused with `--resources`, which are dynamic and accounted). @@ -58,7 +58,7 @@ Value Constraints ----------------- Value constraints can be used to express that a certain attribute with a certain value -should be present on a Mesos slave. For example, the following job would only be +should be present on a Mesos agent. For example, the following job would only be scheduled on nodes that claim to have an `SSD` as their disk. Service( @@ -94,18 +94,18 @@ the scheduler requires that the `$role` configuration, and will reject the job creation otherwise. The remainder of the attribute is free-form. We've developed the idiom of formatting this attribute as `$role/$job`, but do not enforce this. For example: a job `devcluster/www-data/prod/hello` with a dedicated constraint set as -`www-data/web.multi` will have its tasks scheduled only on Mesos slaves configured with: +`www-data/web.multi` will have its tasks scheduled only on Mesos agents configured with: `--attributes=dedicated:www-data/web.multi`. A wildcard (`*`) may be used for the role portion of the dedicated attribute, which will allow any owner to elect for a job to run on the host(s). For example: tasks from both `devcluster/www-data/prod/hello` and `devcluster/vagrant/test/hello` with a dedicated constraint -formatted as `*/web.multi` will be scheduled only on Mesos slaves configured with +formatted as `*/web.multi` will be scheduled only on Mesos agents configured with `--attributes=dedicated:*/web.multi`. This may be useful when assembling a virtual cluster of machines sharing the same set of traits or requirements. ##### Example -Consider the following slave command line: +Consider the following agent command line: mesos-slave --attributes="dedicated:db_team/redis" ... @@ -120,7 +120,7 @@ And this job configuration: ... ) -The job configuration is indicating that it should only be scheduled on slaves with the attribute +The job configuration is indicating that it should only be scheduled on agents with the attribute `dedicated:db_team/redis`. Additionally, Aurora will prevent any tasks that do _not_ have that -constraint from running on those slaves. +constraint from running on those agents. Modified: aurora/site/source/documentation/latest/features/containers.md URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/features/containers.md?rev=1748470&r1=1748469&r2=1748470&view=diff ============================================================================== --- aurora/site/source/documentation/latest/features/containers.md (original) +++ aurora/site/source/documentation/latest/features/containers.md Tue Jun 14 21:35:25 2016 @@ -1,7 +1,6 @@ Containers ========== - Docker ------ @@ -11,7 +10,7 @@ Example (available in the [Vagrant envir $ cat /vagrant/examples/jobs/docker/hello_docker.aurora - hello_docker = Process( + hello_world_proc = Process( name = 'hello', cmdline = """ while true; do @@ -41,3 +40,21 @@ Example (available in the [Vagrant envir In order to correctly execute processes inside a job, the docker container must have Python 2.7 installed. Further details of how to use Docker can be found in the [Reference Documentation](../../reference/configuration/#docker-object). + +Mesos +----- + +*Note: In order to use filesystem images with Aurora, you must be running at least Mesos 0.28.x* + +Aurora supports specifying a task filesystem image to use with the [Mesos containerizer](http://mesos.apache.org/documentation/latest/container-image/). +This is done by setting the ```container``` property of the Job to a ```Mesos``` container object +that includes the image to use. Both [AppC](https://github.com/appc/spec/blob/master/SPEC.md) and +[Docker](https://github.com/docker/docker/blob/master/image/spec/v1.md) images are supported. + +``` +job = Job( + ... + container = Mesos(image=DockerImage(name='my-image', tag='my-tag')) + ... +) +``` Modified: aurora/site/source/documentation/latest/features/job-updates.md URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/features/job-updates.md?rev=1748470&r1=1748469&r2=1748470&view=diff ============================================================================== --- aurora/site/source/documentation/latest/features/job-updates.md (original) +++ aurora/site/source/documentation/latest/features/job-updates.md Tue Jun 14 21:35:25 2016 @@ -71,7 +71,7 @@ acknowledging ("heartbeating") job updat service updates where explicit job health monitoring is vital during the entire job update lifecycle. Such job updates would rely on an external service (or a custom client) periodically pulsing an active coordinated job update via a -[pulseJobUpdate RPC](https://github.com/apache/aurora/blob/rel/0.13.0/api/src/main/thrift/org/apache/aurora/gen/api.thrift). +[pulseJobUpdate RPC](https://github.com/apache/aurora/blob/rel/0.14.0/api/src/main/thrift/org/apache/aurora/gen/api.thrift). A coordinated update is defined by setting a positive [pulse_interval_secs](../../reference/configuration/#updateconfig-objects) value in job configuration Modified: aurora/site/source/documentation/latest/features/resource-isolation.md URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/features/resource-isolation.md?rev=1748470&r1=1748469&r2=1748470&view=diff ============================================================================== --- aurora/site/source/documentation/latest/features/resource-isolation.md (original) +++ aurora/site/source/documentation/latest/features/resource-isolation.md Tue Jun 14 21:35:25 2016 @@ -101,6 +101,14 @@ are still available but you shouldn't co application can write above its quota without getting an ENOSPC, but it will be killed shortly after. This is subject to change. +### GPU Isolation + +GPU isolation will be supported for Nvidia devices starting from Mesos 0.29.0. +Access to the allocated units will be exclusive with no sharing between tasks +allowed (e.g. no fractional GPU allocation). Until official documentation is released, +see [Mesos design document](https://docs.google.com/document/d/10GJ1A80x4nIEo8kfdeo9B11PIbS1xJrrB4Z373Ifkpo/edit#heading=h.w84lz7p4eexl) +for more details. + ### Other Resources Other resources, such as network bandwidth, do not have any performance @@ -141,6 +149,10 @@ add the maximum size of the Java heap to order to account for an out of memory error dumping the heap into the application's sandbox space. +## GPU Sizing + +GPU is highly dependent on your application requirements and is only limited +by the number of physical GPU units available on a target box. Oversubscription ---------------- @@ -158,10 +170,10 @@ jobs. If not configured properly revocab -receive_revocable_resources=true -Specify a tier configuration file path (unless you want to use the [default](https://github.com/apache/aurora/blob/rel/0.13.0/src/main/resources/org/apache/aurora/scheduler/tiers.json)): +Specify a tier configuration file path (unless you want to use the [default](https://github.com/apache/aurora/blob/rel/0.14.0/src/main/resources/org/apache/aurora/scheduler/tiers.json)): -tier_config=path/to/tiers/config.json -See the [Configuration Reference](../../references/configuration/) for details on how to mark a job +See the [Configuration Reference](../../reference/configuration/) for details on how to mark a job as being revocable. Modified: aurora/site/source/documentation/latest/features/service-discovery.md URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/features/service-discovery.md?rev=1748470&r1=1748469&r2=1748470&view=diff ============================================================================== --- aurora/site/source/documentation/latest/features/service-discovery.md (original) +++ aurora/site/source/documentation/latest/features/service-discovery.md Tue Jun 14 21:35:25 2016 @@ -28,17 +28,15 @@ An example is using [Mesos-DNS](https:// records. With current implementation, the example job with key `devcluster/vagrant/test/http-example` generates at least the following: -1. An A record for `http_example.test.vagrant.twitterscheduler.mesos` (which only includes IP address); +1. An A record for `http_example.test.vagrant.aurora.mesos` (which only includes IP address); 2. A [SRV record](https://en.wikipedia.org/wiki/SRV_record) for - `_http_example.test.vagrant._tcp.twitterscheduler.mesos`, which includes IP address and every port. This should only + `_http_example.test.vagrant._tcp.aurora.mesos`, which includes IP address and every port. This should only be used if the service has one port. -3. A SRV record `_{port-name}._http_example.test.vagrant._tcp.twitterscheduler.mesos` for each port name +3. A SRV record `_{port-name}._http_example.test.vagrant._tcp.aurora.mesos` for each port name defined. This should be used when the service has multiple ports. Things to note: 1. The domain part (".mesos" in above example) can be configured in [Mesos DNS](http://mesosphere.github.io/mesos-dns/docs/configuration-parameters.html); -2. The `twitterscheduler` part is the lower-case of framework name, which is not configurable right now (see - [TWITTER_SCHEDULER_NAME](https://github.com/apache/aurora/blob/master/src/main/java/org/apache/aurora/scheduler/mesos/CommandLineDriverSettingsModule.java#L98)); -3. Right now, portmap and port aliases in announcer object are not reflected in DiscoveryInfo, therefore not visible in - Mesos DNS records either. This is because they are only resolved in thermos executors. \ No newline at end of file +2. Right now, portmap and port aliases in announcer object are not reflected in DiscoveryInfo, therefore not visible in + Mesos DNS records either. This is because they are only resolved in thermos executors. Modified: aurora/site/source/documentation/latest/features/sla-metrics.md URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/features/sla-metrics.md?rev=1748470&r1=1748469&r2=1748470&view=diff ============================================================================== --- aurora/site/source/documentation/latest/features/sla-metrics.md (original) +++ aurora/site/source/documentation/latest/features/sla-metrics.md Tue Jun 14 21:35:25 2016 @@ -45,11 +45,11 @@ will not degrade this metric.* A fault in the task environment may cause the Aurora/Mesos to have different views on the task state or lose track of the task existence. In such cases, the service task is marked as LOST and rescheduled by Aurora. For example, this may happen when the task stays in ASSIGNED or STARTING -for too long or the Mesos slave becomes unhealthy (or disappears completely). The time between +for too long or the Mesos agent becomes unhealthy (or disappears completely). The time between task entering LOST and its replacement reaching RUNNING state is counted towards platform downtime. Another example of a platform downtime event is the administrator-requested task rescheduling. This -happens during planned Mesos slave maintenance when all slave tasks are marked as DRAINED and +happens during planned Mesos agent maintenance when all agent tasks are marked as DRAINED and rescheduled elsewhere. To accurately calculate Platform Uptime, we must separate platform incurred downtime from user @@ -62,7 +62,7 @@ relevant to uptime calculations. By appl transition records, we can build a deterministic downtime trace for every given service instance. A task going through a state transition carries one of three possible SLA meanings -(see [SlaAlgorithm.java](https://github.com/apache/aurora/blob/rel/0.13.0/src/main/java/org/apache/aurora/scheduler/sla/SlaAlgorithm.java) for +(see [SlaAlgorithm.java](https://github.com/apache/aurora/blob/rel/0.14.0/src/main/java/org/apache/aurora/scheduler/sla/SlaAlgorithm.java) for sla-to-task-state mapping): * Task is UP: starts a period where the task is considered to be up and running from the Aurora @@ -109,7 +109,7 @@ metric that helps track the dependency o * Per job - `sla__mtta_ms` * Per cluster - `sla_cluster_mtta_ms` * Per instance size (small, medium, large, x-large, xx-large). Size are defined in: -[ResourceAggregates.java](https://github.com/apache/aurora/blob/rel/0.13.0/src/main/java/org/apache/aurora/scheduler/base/ResourceAggregates.java) +[ResourceAggregates.java](https://github.com/apache/aurora/blob/rel/0.14.0/src/main/java/org/apache/aurora/scheduler/base/ResourceAggregates.java) * By CPU: * `sla_cpu_small_mtta_ms` * `sla_cpu_medium_mtta_ms` @@ -145,7 +145,7 @@ reflecting on the overall time it takes * Per job - `sla__mttr_ms` * Per cluster - `sla_cluster_mttr_ms` * Per instance size (small, medium, large, x-large, xx-large). Size are defined in: -[ResourceAggregates.java](https://github.com/apache/aurora/blob/rel/0.13.0/src/main/java/org/apache/aurora/scheduler/base/ResourceAggregates.java) +[ResourceAggregates.java](https://github.com/apache/aurora/blob/rel/0.14.0/src/main/java/org/apache/aurora/scheduler/base/ResourceAggregates.java) * By CPU: * `sla_cpu_small_mttr_ms` * `sla_cpu_medium_mttr_ms` Added: aurora/site/source/documentation/latest/features/webhooks.md URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/features/webhooks.md?rev=1748470&view=auto ============================================================================== --- aurora/site/source/documentation/latest/features/webhooks.md (added) +++ aurora/site/source/documentation/latest/features/webhooks.md Tue Jun 14 21:35:25 2016 @@ -0,0 +1,80 @@ +Webhooks +======== + +Aurora has an optional feature which allows operator to specify a file to configure a HTTP webhook +to receive task state change events. It can be enabled with a scheduler flag eg +`-webhook_config=/path/to/webhook.json`. At this point, webhooks are still considered *experimental*. + +Below is a sample configuration: + +```json +{ + "headers": { + "Content-Type": "application/vnd.kafka.json.v1+json", + "Producer-Type": "reliable" + }, + "targetURL": "http://localhost:5000/", + "timeoutMsec": 5 +} +``` + +And an example of a response that you will get back: +```json +{ + "task": + { + "cachedHashCode":0, + "assignedTask": { + "cachedHashCode":0, + "taskId":"vagrant-test-http_example-8-a6cf7ec5-d793-49c7-b10f-0e14ab80bfff", + "task": { + "cachedHashCode":-1819348376, + "job": { + "cachedHashCode":803049425, + "role":"vagrant", + "environment":"test", + "name":"http_example" + }, + "owner": { + "cachedHashCode":226895216, + "user":"vagrant" + }, + "isService":true, + "numCpus":0.1, + "ramMb":16, + "diskMb":8, + "priority":0, + "maxTaskFailures":1, + "production":false, + "resources":[ + {"cachedHashCode":729800451,"setField":"NUM_CPUS","value":0.1}, + {"cachedHashCode":552899914,"setField":"RAM_MB","value":16}, + {"cachedHashCode":-1547868317,"setField":"DISK_MB","value":8}, + {"cachedHashCode":1957328227,"setField":"NAMED_PORT","value":"http"}, + {"cachedHashCode":1954229436,"setField":"NAMED_PORT","value":"tcp"} + ], + "constraints":[], + "requestedPorts":["http","tcp"], + "taskLinks":{"http":"http://%host%:%port:http%"}, + "contactEmail":"vagrant@localhost", + "executorConfig": { + "cachedHashCode":-1194797325, + "name":"AuroraExecutor", + "data": "{\"environment\": \"test\", \"health_check_config\": {\"initial_interval_secs\": 5.0, \"health_checker\": { \"http\": {\"expected_response_code\": 0, \"endpoint\": \"/health\", \"expected_response\": \"ok\"}}, \"max_consecutive_failures\": 0, \"timeout_secs\": 1.0, \"interval_secs\": 1.0}, \"name\": \"http_example\", \"service\": true, \"max_task_failures\": 1, \"cron_collision_policy\": \"KILL_EXISTING\", \"enable_hooks\": false, \"cluster\": \"devcluster\", \"task\": {\"processes\": [{\"daemon\": false, \"name\": \"echo_ports\", \"ephemeral\": false, \"max_failures\": 1, \"min_duration\": 5, \"cmdline\": \"echo \\\"tcp port: {{thermos.ports[tcp]}}; http port: {{thermos.ports[http]}}; alias: {{thermos.ports[alias]}}\\\"\", \"final\": false}, {\"daemon\": false, \"name\": \"stage_server\", \"ephemeral\": false, \"max_failures\": 1, \"min_duration\": 5, \"cmdline\": \"cp /vagrant/src/test/sh/org/apache/aurora/e2e/http_example.py .\", \"final\": false}, {\ "daemon\": false, \"name\": \"run_server\", \"ephemeral\": false, \"max_failures\": 1, \"min_duration\": 5, \"cmdline\": \"python http_example.py {{thermos.ports[http]}}\", \"final\": false}], \"name\": \"http_example\", \"finalization_wait\": 30, \"max_failures\": 1, \"max_concurrency\": 0, \"resources\": {\"disk\": 8388608, \"ram\": 16777216, \"cpu\": 0.1}, \"constraints\": [{\"order\": [\"echo_ports\", \"stage_server\", \"run_server\"]}]}, \"production\": false, \"role\": \"vagrant\", \"contact\": \"vagrant@localhost\", \"announce\": {\"primary_port\": \"http\", \"portmap\": {\"alias\": \"http\"}}, \"lifecycle\": {\"http\": {\"graceful_shutdown_endpoint\": \"/quitquitquit\", \"port\": \"health\", \"shutdown_endpoint\": \"/abortabortabort\"}}, \"priority\": 0}"}, + "metadata":[], + "container":{ + "cachedHashCode":-1955376216, + "setField":"MESOS", + "value":{"cachedHashCode":31}} + }, + "assignedPorts":{}, + "instanceId":8 + }, + "status":"PENDING", + "failureCount":0, + "taskEvents":[ + {"cachedHashCode":0,"timestamp":1464992060258,"status":"PENDING","scheduler":"aurora"}] + }, + "oldState":{}} +``` + Modified: aurora/site/source/documentation/latest/getting-started/overview.md URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/getting-started/overview.md?rev=1748470&r1=1748469&r2=1748470&view=diff ============================================================================== --- aurora/site/source/documentation/latest/getting-started/overview.md (original) +++ aurora/site/source/documentation/latest/getting-started/overview.md Tue Jun 14 21:35:25 2016 @@ -53,6 +53,8 @@ a functioning Aurora cluster. When a user task is launched, the agent will launch the executor (in the context of a Linux cgroup or Docker container depending upon the environment), which will in turn fork user processes. + In earlier versions of Mesos and Aurora, the Mesos agent was known as the Mesos slave. + Jobs, Tasks and Processes -------------------------- @@ -73,7 +75,7 @@ A task can merely be a single *process* command line, such as `python2.7 my_script.py`. However, a task can also consist of many separate processes, which all run within a single sandbox. For example, running multiple cooperating agents together, -such as `logrotate`, `installer`, master, or slave processes. This is +such as `logrotate`, `installer`, master, or agent processes. This is where Thermos comes in. While Aurora provides a `Job` abstraction on top of Mesos `Tasks`, Thermos provides a `Process` abstraction underneath Mesos `Task`s and serves as part of the Aurora framework's Modified: aurora/site/source/documentation/latest/getting-started/tutorial.md URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/getting-started/tutorial.md?rev=1748470&r1=1748469&r2=1748470&view=diff ============================================================================== --- aurora/site/source/documentation/latest/getting-started/tutorial.md (original) +++ aurora/site/source/documentation/latest/getting-started/tutorial.md Tue Jun 14 21:35:25 2016 @@ -122,7 +122,7 @@ identifies a Job. A job key consists of in that order: * Cluster refers to the name of a particular Aurora installation. -* Role names are user accounts existing on the slave machines. If you +* Role names are user accounts existing on the agent machines. If you don't know what accounts are available, contact your sysadmin. * Environment names are namespaces; you can count on `test`, `devel`, `staging` and `prod` existing. Modified: aurora/site/source/documentation/latest/getting-started/vagrant.md URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/getting-started/vagrant.md?rev=1748470&r1=1748469&r2=1748470&view=diff ============================================================================== --- aurora/site/source/documentation/latest/getting-started/vagrant.md (original) +++ aurora/site/source/documentation/latest/getting-started/vagrant.md Tue Jun 14 21:35:25 2016 @@ -70,12 +70,22 @@ This command uses the configuration scri This process takes several minutes to complete. +You may notice a warning that guest additions in the VM don't match your version of VirtualBox. +This should generally be harmless, but you may wish to install a vagrant plugin to take care of +mismatches like this for you: + + vagrant plugin install vagrant-vbguest + +With this plugin installed, whenever you `vagrant up` the plugin will upgrade the guest additions +for you when a version mis-match is detected. You can read more about the plugin +[here](https://github.com/dotless-de/vagrant-vbguest). + To verify that Aurora is running on the cluster, visit the following URLs: * Scheduler - http://192.168.33.7:8081 * Observer - http://192.168.33.7:1338 * Mesos Master - http://192.168.33.7:5050 -* Mesos Slave - http://192.168.33.7:5051 +* Mesos Agent - http://192.168.33.7:5051 Log onto the VM @@ -129,9 +139,16 @@ you can use the command `vagrant destroy Troubleshooting --------------- -Most of the vagrant related problems can be fixed by the following steps: +Most of the Vagrant related problems can be fixed by the following steps: * Destroying the vagrant environment with `vagrant destroy` * Killing any orphaned VMs (see AURORA-499) with `virtualbox` UI or `VBoxManage` command line tool * Cleaning the repository of build artifacts and other intermediate output with `git clean -fdx` * Bringing up the vagrant environment with `vagrant up` + +If that still doesn't solve your problem, make sure to inspect the log files: + +* Scheduler: `/var/log/upstart/aurora-scheduler.log` +* Observer: `/var/log/upstart/aurora-thermos-observer.log` +* Mesos Master: `/var/log/mesos/mesos-master.INFO` (also see `.WARNING` and `.ERROR`) +* Mesos Agent: `/var/log/mesos/mesos-slave.INFO` (also see `.WARNING` and `.ERROR`) Modified: aurora/site/source/documentation/latest/index.html.md URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/index.html.md?rev=1748470&r1=1748469&r2=1748470&view=diff ============================================================================== --- aurora/site/source/documentation/latest/index.html.md (original) +++ aurora/site/source/documentation/latest/index.html.md Tue Jun 14 21:35:25 2016 @@ -27,6 +27,7 @@ Description of important Aurora features * [Services](features/services/) * [Service Discovery](features/service-discovery/) * [SLA Metrics](features/sla-metrics/) + * [Webhooks](features/webhooks/) ## Operators For those that wish to manage and fine-tune an Aurora cluster. Modified: aurora/site/source/documentation/latest/operations/configuration.md URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/operations/configuration.md?rev=1748470&r1=1748469&r2=1748470&view=diff ============================================================================== --- aurora/site/source/documentation/latest/operations/configuration.md (original) +++ aurora/site/source/documentation/latest/operations/configuration.md Tue Jun 14 21:35:25 2016 @@ -69,13 +69,13 @@ for Mesos replicated log files to ensure ### `-native_log_zk_group_path` ZooKeeper path used for Mesos replicated log quorum discovery. -See [code](https://github.com/apache/aurora/blob/rel/0.13.0/src/main/java/org/apache/aurora/scheduler/log/mesos/MesosLogStreamModule.java) for +See [code](https://github.com/apache/aurora/blob/rel/0.14.0/src/main/java/org/apache/aurora/scheduler/log/mesos/MesosLogStreamModule.java) for other available Mesos replicated log configuration options and default values. ### Changing the Quorum Size Special care needs to be taken when changing the size of the Aurora scheduler quorum. Since Aurora uses a Mesos replicated log, similar steps need to be followed as when -[changing the mesos quorum size](http://mesos.apache.org/documentation/latest/operational-guide). +[changing the Mesos quorum size](http://mesos.apache.org/documentation/latest/operational-guide). As a preparation, increase `-native_log_quorum_size` on each existing scheduler and restart them. When updating from 3 to 5 schedulers, the quorum size would grow from 2 to 3. @@ -143,12 +143,127 @@ If you need to do computation before sta For example, to wrap the executor inside a simple wrapper, the scheduler will be started like this `-thermos_executor_path=/path/to/wrapper.sh -thermos_executor_resources=/usr/share/aurora/bin/thermos_executor.pex` +## Custom Executor

 +If the need arises to use a Mesos executor other than the Thermos executor, the scheduler can be +configured to utilize a custom executor by specifying the `-custom_executor_config` flag. +The flag must be set to the path of a valid executor configuration file.
 + +The configuration file must be valid JSON and contain, at minimum, the name, command and resources fields. + + +### executor + +Property | Description +----------------------- | --------------------------------- +name (required) | Name of the executor. +command (required) | How to run the executor. +resources (required) | Overhead to use for each executor instance. + +#### command + +Property | Description +----------------------- | --------------------------------- +value (required) | The command to execute. +arguments (optional) | A list of arguments to pass to the command. +uris (optional) | List of resources to download into the task sandbox. + +##### uris (list) +* Follows the [Mesos Fetcher schema](http://mesos.apache.org/documentation/latest/fetcher/) + +Property | Description +----------------------- | --------------------------------- +value (required) | Path to the resource needed in the sandbox. +executable (optional) | Change resource to be executable via chmod. +extract (optional) | Extract files from packed or compressed archives into the sandbox. +cache (optional) | Use caching mechanism provided by Mesos for resources. + +#### resources (list) + +Property | Description +------------------- | --------------------------------- +name (required) | Name of the resource: cpus or mem. +type (required) | Type of resource. Should always be SCALAR. +scalar (required) | Value in float for cpus or int for mem (in MBs) + +### volume_mounts (list) + +Property | Description +------------------------ | --------------------------------- +host_path (required) | Host path to mount inside the container. +container_path (required) | Path inside the container where `host_path` will be mounted. +mode (required) | Mode in which to mount the volume, Read-Write (RW) or Read-Only (RO). + + +A sample configuration is as follows:
 +``` + { + "executor": { + "name": "myExecutor", + "command": { + "value": "myExecutor.sh", + "arguments": [ + "localhost:2181", + "-verbose", + "-config myConfiguration.config" + ], + "uris": [ + { + "value": "/dist/myExecutor.sh", + "executable": true, + "extract": false, + "cache": true + }, + { + "value": "/home/user/myConfiguration.config", + "executable": false, + "extract": false, + "cache": false + } + ] + }, + "resources": [ + { + "name": "cpus", + "type": "SCALAR", + "scalar": { + "value": 1.00 + } + }, + { + "name": "mem", + "type": "SCALAR", + "scalar": { + "value": 512 + } + } + ] + }, + "volume_mounts": [ + { + "mode": "RO", + "container_path": "/path/on/container", + "host_path": "/path/to/host/directory" + }, + { + "mode": "RW", + "container_path": "/container", + "host_path": "/host" + } + ] + } +``` + +It should be noted that if you do not use thermos or a thermos based executor, links in the scheduler's +Web UI for tasks
 will not work (at least for the time being). +Some information about launched tasks can still be accessed via the Mesos Web UI or via the Aurora Client. +Furthermore, this configuration replaces the default thermos executor. +Work is in progress to allow support for multiple executors to co-exist within a single scheduler. ### Docker containers In order for Aurora to launch jobs using docker containers, a few extra configuration options must be set. The [docker containerizer](http://mesos.apache.org/documentation/latest/docker-containerizer/) -must be enabled on the mesos slaves by launching them with the `--containerizers=docker,mesos` option. +must be enabled on the Mesos agents by launching them with the `--containerizers=docker,mesos` option. By default, Aurora will configure Mesos to copy the file specified in `-thermos_executor_path` into the container's sandbox. If using a wrapper script to launch the thermos executor, @@ -158,10 +273,10 @@ wrapper script and executor are correctl script does not access resources outside of the sandbox, as when the script is run from within a docker container those resources will not exist. -A scheduler flag, `-global_container_mounts` allows mounting paths from the host (i.e., the slave) +A scheduler flag, `-global_container_mounts` allows mounting paths from the host (i.e the agent machine) into all containers on that host. The format is a comma separated list of host_path:container_path[:mode] tuples. For example `-global_container_mounts=/opt/secret_keys_dir:/mnt/secret_keys_dir:ro` mounts -`/opt/secret_keys_dir` from the slaves into all launched containers. Valid modes are `ro` and `rw`. +`/opt/secret_keys_dir` from the agents into all launched containers. Valid modes are `ro` and `rw`. If you would like to run a container with a read-only filesystem, it may also be necessary to pass to use the scheduler flag `-thermos_home_in_sandbox` in order to set HOME to the sandbox Modified: aurora/site/source/documentation/latest/operations/installation.md URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/operations/installation.md?rev=1748470&r1=1748469&r2=1748470&view=diff ============================================================================== --- aurora/site/source/documentation/latest/operations/installation.md (original) +++ aurora/site/source/documentation/latest/operations/installation.md Tue Jun 14 21:35:25 2016 @@ -145,7 +145,7 @@ The executor typically does not require be passed to the executor using a command line argument on the scheduler. The observer needs to be configured to look at the correct mesos directory in order to find task -sandboxes. You should 1st find the Mesos working directory by looking for the Mesos slave +sandboxes. You should 1st find the Mesos working directory by looking for the Mesos agent `--work_dir` flag. You should see something like: ps -eocmd | grep "mesos-slave" | grep -v grep | tr ' ' '\n' | grep "\--work_dir" @@ -237,7 +237,7 @@ dev, test, prod) for a production job. ## Installing Mesos -Mesos uses a single package for the Mesos master and slave. As a result, the package dependencies +Mesos uses a single package for the Mesos master and agent. As a result, the package dependencies are identical for both. ### Mesos on Ubuntu Trusty Modified: aurora/site/source/documentation/latest/operations/monitoring.md URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/operations/monitoring.md?rev=1748470&r1=1748469&r2=1748470&view=diff ============================================================================== --- aurora/site/source/documentation/latest/operations/monitoring.md (original) +++ aurora/site/source/documentation/latest/operations/monitoring.md Tue Jun 14 21:35:25 2016 @@ -119,7 +119,7 @@ The number of tasks stored in the schedu If this value is increasing at a high rate, it is a sign of trouble. -There are many sources of `LOST` tasks in Mesos: the scheduler, master, slave, and executor can all +There are many sources of `LOST` tasks in Mesos: the scheduler, master, agent, and executor can all trigger this. The first step is to look in the scheduler logs for `LOST` to identify where the state changes are originating. @@ -169,7 +169,7 @@ This value is currently known to increas value warrants investigation. The scheduler will log when it times out a task. You should trace the task ID of the timed out -task into the master, slave, and/or executors to determine where the message was dropped. +task into the master, agent, and/or executors to determine where the message was dropped. ### `http_500_responses_events` Type: integer counter Modified: aurora/site/source/documentation/latest/reference/client-cluster-configuration.md URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/reference/client-cluster-configuration.md?rev=1748470&r1=1748469&r2=1748470&view=diff ============================================================================== --- aurora/site/source/documentation/latest/reference/client-cluster-configuration.md (original) +++ aurora/site/source/documentation/latest/reference/client-cluster-configuration.md Tue Jun 14 21:35:25 2016 @@ -27,8 +27,8 @@ The following properties may be set: **Property** | **Type** | **Description** :------------------------| :------- | :-------------- **name** | String | Cluster name (Required) - **slave_root** | String | Path to mesos slave work dir (Required) - **slave_run_directory** | String | Name of mesos slave run dir (Required) + **slave_root** | String | Path to Mesos agent work dir (Required) + **slave_run_directory** | String | Name of Mesos agent run dir (Required) **zk** | String | Hostname of ZooKeeper instance used to resolve Aurora schedulers. **zk_port** | Integer | Port of ZooKeeper instance used to locate Aurora schedulers (Default: 2181) **scheduler_zk_path** | String | ZooKeeper path under which scheduler instances are registered. @@ -46,7 +46,7 @@ any job keys identifying jobs running wi ### `slave_root` -The path on the mesos slaves where executing tasks can be found. It is used in combination with the +The path on the Mesos agents where executing tasks can be found. It is used in combination with the `slave_run_directory` property by `aurora task run` and `aurora task ssh` to change into the sandbox directory after connecting to the host. This value should match the value passed to `mesos-slave` as `-work_dir`. Modified: aurora/site/source/documentation/latest/reference/client-commands.md URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/reference/client-commands.md?rev=1748470&r1=1748469&r2=1748470&view=diff ============================================================================== --- aurora/site/source/documentation/latest/reference/client-commands.md (original) +++ aurora/site/source/documentation/latest/reference/client-commands.md Tue Jun 14 21:35:25 2016 @@ -86,7 +86,7 @@ refer to different Jobs. For example, jo `cluster2/foo/prod/workhorse` is different from `cluster1/tyg/test/workhorse.` -Role names are user accounts existing on the slave machines. If you don't know what accounts +Role names are user accounts existing on the agent machines. If you don't know what accounts are available, contact your sysadmin. Environment names are namespaces; you can count on `prod`, `devel` and `test` existing. Modified: aurora/site/source/documentation/latest/reference/configuration-tutorial.md URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/reference/configuration-tutorial.md?rev=1748470&r1=1748469&r2=1748470&view=diff ============================================================================== --- aurora/site/source/documentation/latest/reference/configuration-tutorial.md (original) +++ aurora/site/source/documentation/latest/reference/configuration-tutorial.md Tue Jun 14 21:35:25 2016 @@ -230,7 +230,7 @@ working directory. Typically, you save this code somewhere. You then need to define a Process in your `.aurora` configuration file that fetches the code from that somewhere -to where the slave can see it. For a public cloud, that can be anywhere public on +to where the agent can see it. For a public cloud, that can be anywhere public on the Internet, such as S3. For a private cloud internal storage, you need to put in on an accessible HDFS cluster or similar storage. Modified: aurora/site/source/documentation/latest/reference/configuration.md URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/reference/configuration.md?rev=1748470&r1=1748469&r2=1748470&view=diff ============================================================================== --- aurora/site/source/documentation/latest/reference/configuration.md (original) +++ aurora/site/source/documentation/latest/reference/configuration.md Tue Jun 14 21:35:25 2016 @@ -321,6 +321,7 @@ resources are allocated. ```cpu``` | Float | Fractional number of cores required by the task. ```ram``` | Integer | Bytes of RAM required by the task. ```disk``` | Integer | Bytes of disk required by the task. + ```gpu``` | Integer | Number of GPU cores required by the task Job Schema @@ -328,16 +329,20 @@ Job Schema ### Job Objects +*Note: Specifying a ```Container``` object as the value of the ```container``` property is + deprecated in favor of setting its value directly to the appropriate ```Docker``` or ```Mesos``` + container type* + name | type | description ------ | :-------: | ------- ```task``` | Task | The Task object to bind to this job. Required. ```name``` | String | Job name. (Default: inherited from the task attribute's name) ```role``` | String | Job role account. Required. ```cluster``` | String | Cluster in which this job is scheduled. Required. - ```environment``` | String | Job environment, default ```devel```. Must be one of ```prod```, ```devel```, ```test``` or ```staging```. + ```environment``` | String | Job environment, default ```devel```. Must be one of ```prod```, ```devel```, ```test``` or ```staging```. ```contact``` | String | Best email address to reach the owner of the job. For production jobs, this is usually a team mailing list. ```instances```| Integer | Number of instances (sometimes referred to as replicas or shards) of the task to create. (Default: 1) - ```cron_schedule``` | String | Cron schedule in cron format. May only be used with non-service jobs. See [Cron Jobs](../../features/cron-jobs/) for more information. Default: None (not a cron job.) + ```cron_schedule``` | String | Cron schedule in cron format. May only be used with non-service jobs. See [Cron Jobs](../../features/cron-jobs/) for more information. Default: None (not a cron job.) ```cron_collision_policy``` | String | Policy to use when a cron job is triggered while a previous run is still active. KILL_EXISTING Kill the previous run, and schedule the new run CANCEL_NEW Let the previous run continue, and cancel the new run. (Default: KILL_EXISTING) ```update_config``` | ```UpdateConfig``` object | Parameters for controlling the rate and policy of rolling updates. ```constraints``` | dict | Scheduling constraints for the tasks. See the section on the [constraint specification language](#specifying-scheduling-constraints) @@ -346,7 +351,7 @@ Job Schema ```priority``` | Integer | Preemption priority to give the task (Default 0). Tasks with higher priorities may preempt tasks at lower priorities. ```production``` | Boolean | Whether or not this is a production task that may [preempt](../../features/multitenancy/#preemption) other tasks (Default: False). Production job role must have the appropriate [quota](../../features/multitenancy/#preemption). ```health_check_config``` | ```HealthCheckConfig``` object | Parameters for controlling a task's health checks. HTTP health check is only used if a health port was assigned with a command line wildcard. - ```container``` | ```Container``` object | An optional container to run all processes inside of. + ```container``` | Choice of ```Container```, ```Docker``` or ```Mesos``` object | An optional container to run all processes inside of. ```lifecycle``` | ```LifecycleConfig``` object | An optional task lifecycle configuration that dictates commands to be executed on startup/teardown. HTTP lifecycle is enabled by default if the "health" port is requested. See [LifecycleConfig Objects](#lifecycleconfig-objects) for more information. ```tier``` | String | Task tier type. The default scheduler tier configuration allows for 3 tiers: `revocable`, `preemptible`, and `preferred`. The `revocable` tier requires the task to run with Mesos revocable resources. Setting the task's tier to `preemptible` allows for the possibility of that task being preempted by other tasks when cluster is running low on resources. The `preferred` tier prevents the task from using revocable resources and from being preempted. Since it is possible that a cluster is configured with a custom tier configuration, users should consult their cluster administrator to be informed of the tiers supported by the cluster. Attempts to schedule jobs with an unsupported tier will be rejected by the scheduler. @@ -367,8 +372,6 @@ Parameters for controlling the rate and ### HealthCheckConfig Objects -*Note: ```endpoint```, ```expected_response``` and ```expected_response_code``` are deprecated from ```HealthCheckConfig``` and must be definied in ```HttpHealthChecker```.* - Parameters for controlling a task's health checks via HTTP or a shell command. | param | type | description @@ -408,12 +411,12 @@ no announcement will take place. For mo documentation. By default, the hostname in the registered endpoints will be the `--hostname` parameter -that is passed to the mesos slave. To override the hostname value, the executor can be started +that is passed to the mesos agent. To override the hostname value, the executor can be started with `--announcer-hostname=`. If you decide to use `--announcer-hostname` and if the overriden value needs to change for every executor, then the executor has to be started inside a wrapper, see [Executor Wrapper](../../operations/configuration/#thermos-executor-wrapper). For example, if you want the hostname in the endpoint to be an IP address instead of the hostname, -the `--hostname` parameter to the mesos slave can be set to the machine IP or the executor can +the `--hostname` parameter to the mesos agent can be set to the machine IP or the executor can be started with `--announcer-hostname=` while wrapping the executor inside a script. | object | type | description @@ -443,21 +446,23 @@ find a static port 80. No port would be Static ports should be used cautiously as Aurora does nothing to prevent two tasks with the same static port allocations from being co-scheduled. -External constraints such as slave attributes should be used to enforce such +External constraints such as agent attributes should be used to enforce such guarantees should they be needed. ### Container Objects -*Note: The only container type currently supported is "docker". Docker support is currently EXPERIMENTAL.* +*Note: Both Docker and Mesos unified-container support are currently EXPERIMENTAL.* *Note: In order to correctly execute processes inside a job, the Docker container must have python 2.7 installed.* *Note: For private docker registry, mesos mandates the docker credential file to be named as `.dockercfg`, even though docker may create a credential file with a different name on various platforms. Also, the `.dockercfg` file needs to be copied into the sandbox using the `-thermos_executor_resources` flag, specified while starting Aurora.* -Describes the container the job's processes will run inside. +Describes the container the job's processes will run inside. If not using Docker or the Mesos +unified-container, the container can be omitted from your job config. param | type | description ----- | :----: | ----------- ```docker``` | Docker | A docker container to use. + ```mesos``` | Mesos | A mesos container to use. ### Docker Object @@ -476,6 +481,34 @@ See [Docker Command Line Reference](http ```name``` | String | The name of the docker parameter. E.g. volume ```value``` | String | The value of the parameter. E.g. /usr/local/bin:/usr/bin:rw +### Mesos Object + + param | type | description + ----- | :----: | ----------- + ```image``` | Choice(AppcImage, DockerImage) | An optional filesystem image to use within this container. + +### AppcImage + +*Note: In order to correctly execute processes inside a job, the filesystem image must include python 2.7.* + +Describes an AppC filesystem image. + + param | type | description + ----- | :----: | ----------- + ```name``` | String | The name of the appc image. + ```image_id``` | String | The [image id](https://github.com/appc/spec/blob/master/spec/aci.md#image-id) of the appc image. + +### DockerImage + +*Note: In order to correctly execute processes inside a job, the filesystem image must include python 2.7.* + +Describes a Docker filesystem image. + + param | type | description + ----- | :----: | ----------- + ```name``` | String | The name of the docker image. + ```tag``` | String | The tag that identifies the docker image. + ### LifecycleConfig Objects *Note: The only lifecycle configuration supported is the HTTP lifecycle via the HttpLifecycleConfig.* @@ -538,7 +571,7 @@ Aurora client or Aurora-provided service ### mesos Namespace -The `mesos` namespace contains variables which relate to the `mesos` slave +The `mesos` namespace contains variables which relate to the `mesos` agent which launched the task. The `instance` variable can be used to distinguish between Task replicas. @@ -570,4 +603,3 @@ For example, if '{{`thermos.ports[http]` configuration, it is automatically extracted and auto-populated by Aurora, but must be specified with, for example, `thermos -P http:12345` to map `http` to port 12345 when running via the CLI. - Modified: aurora/site/source/documentation/latest/reference/scheduler-configuration.md URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/reference/scheduler-configuration.md?rev=1748470&r1=1748469&r2=1748470&view=diff ============================================================================== --- aurora/site/source/documentation/latest/reference/scheduler-configuration.md (original) +++ aurora/site/source/documentation/latest/reference/scheduler-configuration.md Tue Jun 14 21:35:25 2016 @@ -16,6 +16,10 @@ Required flags: Directory to store backups under. Will be created if it does not exist. -cluster_name [not null] Name to identify the cluster being served. +-db_max_active_connection_count [must be > 0] + Max number of connections to use with database via MyBatis +-db_max_idle_connection_count [must be > 0] + Max number of idle connections to the database via MyBatis -framework_authentication_file Properties file which contains framework credentials to authenticate with Mesosmaster. Must contain the properties 'aurora_authentication_principal' and 'aurora_authentication_secret'. -mesos_master_address [not null] @@ -30,8 +34,6 @@ Required flags: Path to the thermos executor entry point. -tier_config [file must be readable] Configuration file defining supported task tiers, task traits and behaviors. --zk_digest_credentials - user:password to use when authenticating with ZooKeeper. -zk_endpoints [must have at least 1 item] Endpoint specification for the ZooKeeper servers. @@ -83,9 +85,11 @@ Optional flags: -flapping_task_threshold (default (5, mins)) A task that repeatedly runs for less than this time is considered to be flapping. -framework_announce_principal (default false) - When 'framework_authentication_file' flag is set, the FrameworkInfo registered with the mesos master will also contain the principal. This is necessary if you intend to use mesos authorization via mesos ACLs. The default will change in a future release. + When 'framework_authentication_file' flag is set, the FrameworkInfo registered with the mesos master will also contain the principal. This is necessary if you intend to use mesos authorization via mesos ACLs. The default will change in a future release. Changing this value is backwards incompatible. For details, see MESOS-703. -framework_failover_timeout (default (21, days)) Time after which a framework is considered deleted. SHOULD BE VERY HIGH. +-framework_name (default TwitterScheduler) + Name used to register the Aurora framework with Mesos. Changing this value can be backwards incompatible. For details, see MESOS-703. -global_container_mounts (default []) A comma separated list of mount points (in host:container form) to mount into all (non-mesos) containers. -history_max_per_job_threshold (default 100) @@ -154,6 +158,8 @@ Optional flags: The timeout for doing log appends and truncations. -native_log_zk_group_path A zookeeper node for use by the native log to track the master coordinator. +-offer_filter_duration (default (5, secs)) + Duration after which we expect Mesos to re-offer unused resources. A short duration improves scheduling performance in smaller clusters, but might lead to resource starvation for other frameworks if you run many frameworks in your cluster. -offer_hold_jitter_window (default (1, mins)) Maximum amount of random jitter to add to the offer hold time window. -offer_reservation_duration (default (3, mins)) @@ -180,7 +186,7 @@ Optional flags: If false, Docker tasks may run without an executor (EXPERIMENTAL) -shiro_ini_path Path to shiro.ini for authentication and authorization configuration. --shiro_realm_modules (default [org.apache.aurora.scheduler.app.MoreModules$1@2d3379b4]) +-shiro_realm_modules (default [org.apache.aurora.scheduler.app.MoreModules$1@13c9d689]) Guice modules for configuring Shiro Realms. -sla_non_prod_metrics (default []) Metric categories collected for non production tasks. @@ -206,19 +212,23 @@ Optional flags: A comma separated list of additional resources to copy into the sandbox.Note: if thermos_executor_path is not the thermos_executor.pex file itself, this must include it. -thermos_home_in_sandbox (default false) If true, changes HOME to the sandbox before running the executor. This primarily has the effect of causing the executor and runner to extract themselves into the sandbox. --thermos_observer_root (default /var/run/thermos) - Path to the thermos observer root (by default /var/run/thermos.) -transient_task_state_timeout (default (5, mins)) The amount of time after which to treat a task stuck in a transient state as LOST. -use_beta_db_task_store (default false) Whether to use the experimental database-backed task store. -viz_job_url_prefix (default ) URL prefix for job container stats. +-webhook_config [file must be readable] + File to configure a HTTP webhook to receive task state change events. -zk_chroot_path chroot path to use for the ZooKeeper connections +-zk_digest_credentials + user:password to use when authenticating with ZooKeeper. -zk_in_proc (default false) Launches an embedded zookeeper server for local testing causing -zk_endpoints to be ignored if specified. -zk_session_timeout (default (4, secs)) The ZooKeeper session timeout. +-zk_use_curator (default false) + Uses Apache Curator as the zookeeper client; otherwise a copy of Twitter commons/zookeeper (the legacy library) is used. ------------------------------------------------------------------------- ``` Modified: aurora/site/source/documentation/latest/reference/task-lifecycle.md URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/reference/task-lifecycle.md?rev=1748470&r1=1748469&r2=1748470&view=diff ============================================================================== --- aurora/site/source/documentation/latest/reference/task-lifecycle.md (original) +++ aurora/site/source/documentation/latest/reference/task-lifecycle.md Tue Jun 14 21:35:25 2016 @@ -26,14 +26,14 @@ particular role" or attribute limit cons finds a suitable match, it assigns the `Task` to a machine and puts the `Task` into the `ASSIGNED` state. -From the `ASSIGNED` state, the scheduler sends an RPC to the slave -machine containing `Task` configuration, which the slave uses to spawn +From the `ASSIGNED` state, the scheduler sends an RPC to the agent +machine containing `Task` configuration, which the agent uses to spawn an executor responsible for the `Task`'s lifecycle. When the scheduler receives an acknowledgment that the machine has accepted the `Task`, the `Task` goes into `STARTING` state. `STARTING` state initializes a `Task` sandbox. When the sandbox is fully -initialized, Thermos begins to invoke `Process`es. Also, the slave +initialized, Thermos begins to invoke `Process`es. Also, the agent machine sends an update to the scheduler that the `Task` is in `RUNNING` state. @@ -67,7 +67,7 @@ failure. ### Forceful Termination: KILLING, RESTARTING You can terminate a `Task` by issuing an `aurora job kill` command, which -moves it into `KILLING` state. The scheduler then sends the slave a +moves it into `KILLING` state. The scheduler then sends the agent a request to terminate the `Task`. If the scheduler receives a successful response, it moves the Task into `KILLED` state and never restarts it. @@ -75,7 +75,7 @@ If a `Task` is forced into the `RESTARTI command, the scheduler kills the underlying task but in parallel schedules an identical replacement for it. -In any case, the responsible executor on the slave follows an escalation +In any case, the responsible executor on the agent follows an escalation sequence when killing a running task: 1. If a `HttpLifecycleConfig` is not present, skip to (4). @@ -95,9 +95,9 @@ If a `Task` stays in a transient task st or `STARTING`), the scheduler forces it into `LOST` state, creating a new `Task` in its place that's sent into `PENDING` state. -In addition, if the Mesos core tells the scheduler that a slave has +In addition, if the Mesos core tells the scheduler that a agent has become unhealthy (or outright disappeared), the `Task`s assigned to that -slave go into `LOST` state and new `Task`s are created in their place. +agent go into `LOST` state and new `Task`s are created in their place. From `PENDING` state, there is no guarantee a `Task` will be reassigned to the same machine unless job constraints explicitly force it there. @@ -121,9 +121,9 @@ preempted in favor of production tasks. ### Making Room for Maintenance: DRAINING -Cluster operators can set slave into maintenance mode. This will transition -all `Task` running on this slave into `DRAINING` and eventually to `KILLED`. -Drained `Task`s will be restarted on other slaves for which no maintenance +Cluster operators can set agent into maintenance mode. This will transition +all `Task` running on this agent into `DRAINING` and eventually to `KILLED`. +Drained `Task`s will be restarted on other agents for which no maintenance has been announced yet.