mesos-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From vinodk...@apache.org
Subject mesos git commit: Added user doc for Scheduler HTTP API.
Date Tue, 18 Aug 2015 04:50:54 GMT
Repository: mesos
Updated Branches:
  refs/heads/master 2dba32978 -> 44666db57


Added user doc for Scheduler HTTP API.

Review: https://reviews.apache.org/r/37512


Project: http://git-wip-us.apache.org/repos/asf/mesos/repo
Commit: http://git-wip-us.apache.org/repos/asf/mesos/commit/44666db5
Tree: http://git-wip-us.apache.org/repos/asf/mesos/tree/44666db5
Diff: http://git-wip-us.apache.org/repos/asf/mesos/diff/44666db5

Branch: refs/heads/master
Commit: 44666db57f69c03e344160df79bab3f20bc15c4e
Parents: 2dba329
Author: Vinod Kone <vinodkone@gmail.com>
Authored: Sun Aug 16 13:05:57 2015 -0700
Committer: Vinod Kone <vinodkone@gmail.com>
Committed: Mon Aug 17 21:50:41 2015 -0700

----------------------------------------------------------------------
 docs/scheduler_http_api.md | 519 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 519 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/mesos/blob/44666db5/docs/scheduler_http_api.md
----------------------------------------------------------------------
diff --git a/docs/scheduler_http_api.md b/docs/scheduler_http_api.md
new file mode 100644
index 0000000..11f4d83
--- /dev/null
+++ b/docs/scheduler_http_api.md
@@ -0,0 +1,519 @@
+---
+layout: documentation
+---
+
+# Scheduler HTTP API
+
+Mesos 0.24.0 added **experimental** support for v1 Scheduler HTTP API.
+
+
+## Overview
+
+The scheduler interacts with Mesos via  “/api/v1/scheduler” endpoint hosted by the Mesos
master. The fully qualified URL of the endpoint might look like:
+
+	http://masterhost:5050/api/v1/scheduler
+
+Note that we refer to this endpoint with its suffix "/scheduler" in the rest of this document.
This endpoint accepts HTTP POST requests with data encoded as JSON (Content-Type: application/json)
or binary Protobuf (Content-Type: application/x-protobuf). The first request that a scheduler
sends to “/scheduler” endpoint is called SUBSCRIBE and results in a streaming response
(“200 OK” status code with Transfer-Encoding: chunked). **Schedulers are expected to keep
the subscription connection open as long as possible (barring errors in network, software,
hardware etc.) and incrementally process the response** (NOTE: HTTP client libraries that
can only parse the response after the connection is closed cannot be used). For the encoding
used, please refer to **Events** section below.
+
+All the subsequent (non subscribe) requests to “/scheduler” endpoint (see details below
in **Calls** section) must be sent using a different connection(s) than the one being used
for subscription. Master responds to these HTTP POST requests with “202 Accepted” status
codes (or, for unsuccessful requests, with 4xx or 5xx status codes; details in later sections).
The “202 Accepted” response means that a request has been accepted for processing, not
that the processing of the request has been completed. The request might or might not be acted
upon by Mesos (e.g., master fails during the processing of the request). Any asynchronous
responses from these requests will be streamed on long-lived subscription connection.
+
+
+## Calls
+
+The following calls are currently accepted by the master. The canonical source of this information
is [scheduler.proto](https://github.com/apache/mesos/blob/master/include/mesos/v1/scheduler/scheduler.proto)
(NOTE: The protobuf definitions are subject to change before the beta API is finalized). Note
that when sending JSON encoded Calls, schedulers should encode raw bytes in Base64 and strings
in UTF-8.
+
+### SUBSCRIBE
+
+This is the first step in the communication process between the scheduler and the master.
This is also to be considered as subscription to the “/scheduler” events stream.
+
+To subscribe with the master, the scheduler sends a HTTP POST request with encoded  `SUBSCRIBE`
message with the required FrameworkInfo. Note that if "subscribe.framework_info.id" is not
set, master considers the scheduler as a new one and subscribes it by assigning it a FrameworkID.
The HTTP response is a stream with RecordIO encoding, with the first event being `SUBSCRIBED`
event (see details in **Events** section).
+
+```
+SUBSCRIBE Request (JSON):
+
+POST /api/v1/scheduler  HTTP/1.1
+
+Host: masterhost:5050
+Content-Type: application/json
+Accept: application/json
+Connection: close
+
+{
+   “type”		: “SUBSCRIBE”,
+
+   “subscribe”	: {
+      “framework_info”	: {
+        “user” :  “foo”,
+        “name” :  “Example HTTP Framework”
+      },
+
+      “force” : true
+  }
+}
+
+SUBSCRIBE Response Event (JSON):
+HTTP/1.1 200 OK
+
+Content-Type: application/json
+Transfer-Encoding: chunked
+
+<event length>
+{
+ “type”			: “SUBSCRIBED”,
+ “subscribed”	: {
+     “framework_id”               : {“value”:“12220-3440-12532-2345”},
+     "heartbeat_interval_seconds" : 15
+  }
+}
+<more events>
+```
+
+Alternatively, if “subscribe.framework_info.id” is set, master considers this a request
from an already subscribed scheduler reconnecting after a disconnection (e.g., due to failover
or network disconnection) and responds with `SUBSCRIBED` event containing the same FrameworkID.
The “subscribe.force” field describes how the master reacts when multiple scheduler instances
(with the same framework id) attempt to subscribe with the master at the same time (e.g.,
due to network partition). See the semantics in **Disconnections** section below.
+
+NOTE: In the old version of the API, (re-)registered callbacks also included MasterInfo,
which contained information about the master the driver currently connected to. With the new
API, since schedulers explicitly subscribe with the leading master (see details below in **Master
Detection** section), it’s not relevant anymore.
+
+If subscription fails for whatever reason (e.g., invalid request), a HTTP 4xx response is
returned with the error message as part of the body and the connection is closed.
+
+Scheduler must make additional HTTP requests to the “/scheduler” endpoint only after
it has opened a persistent connection to it by sending a `SUBSCRIBE` request and received
a `SUBSCRIBED` response. Calls made without subscription will result in a “403 Forbidden“
instead of a “202 Accepted“ response. A scheduler might also receive a “400 Bad Request”
response if the HTTP request is malformed (e.g., malformed HTTP headers).
+
+### TEARDOWN
+Sent by the scheduler when it wants to tear itself down. When Mesos receives this request
it will shut down all executors (and consequently kill tasks) and remove persistent volumes
(if requested). It then removes the framework and closes all open connections from this scheduler
to the Master.
+
+```
+TEARDOWN Request (JSON):
+POST /api/v1/scheduler  HTTP/1.1
+
+Host: masterhost:5050
+Content-Type: application/json
+
+{
+  “framework_id”	: {“value” : “12220-3440-12532-2345”},
+  “type”			: “TEARDOWN”,
+}
+
+TEARDOWN Response:
+HTTP/1.1 202 Accepted
+```
+
+### ACCEPT
+Sent by the scheduler when it accepts offer(s) sent by the master. The `ACCEPT` request includes
the type of operations (e.g., launch task, reserve resources, create volumes) that the scheduler
wants to perform on the offers. Note that until the scheduler replies (accepts or declines)
to an offer, its resources are considered allocated to the framework. Also, any of the offer’s
resources not used in the `ACCEPT` call (e.g., to launch a task) are considered declined and
might be reoffered to other frameworks. In other words, the same `OfferID` cannot be used
in more than one `ACCEPT` call. These semantics might change when we add new features to Mesos
(e.g., persistence, reservations, optimistic offers, resizeTask, etc.).
+
+```
+ACCEPT Request (JSON):
+POST /api/v1/scheduler  HTTP/1.1
+
+Host: masterhost:5050
+Content-Type: application/json
+
+{
+  “framework_id”	: {“value” : “12220-3440-12532-2345”},
+  “type”			: “ACCEPT”,
+  “accept”			: {
+    “offer_ids”		: [
+                       {“value” : “12220-3440-12532-O12”},
+                       {“value” : “12220-3440-12532-O12”}
+                      ],
+    “operations”	: [ {“type” : “LAUNCH”, “launch” : {...}} ],
+    “filters”		: {...}
+  }
+}
+
+ACCEPT Response:
+HTTP/1.1 202 Accepted
+
+```
+
+### DECLINE
+Sent by the scheduler to explicitly decline offer(s) received. Note that this is same as
sending an `ACCEPT` call with no operations.
+
+```
+DECLINE Request (JSON):
+POST /api/v1/scheduler  HTTP/1.1
+
+Host: masterhost:5050
+Content-Type: application/json
+
+{
+  “framework_id”	: {“value” : “12220-3440-12532-2345”},
+  “type”			: “DECLINE”,
+  “decline”			: {
+    “offer_ids”	: [
+                   {“value” : “12220-3440-12532-O12”},
+                   {“value” : “12220-3440-12532-O13”}
+                  ],
+    “filters”	: {...}
+  }
+}
+
+DECLINE Response:
+HTTP/1.1 202 Accepted
+
+```
+
+### REVIVE
+Sent by the scheduler to remove any/all filters that it has previously set via `ACCEPT` or
`DECLINE` calls.
+
+```
+REVIVE Request (JSON):
+POST /api/v1/scheduler  HTTP/1.1
+
+Host: masterhost:5050
+Content-Type: application/json
+
+{
+  “framework_id”	: {“value” : “12220-3440-12532-2345”},
+  “type”			: “REVIVE”,
+}
+
+REVIVE Response:
+HTTP/1.1 202 Accepted
+
+```
+
+### KILL
+Sent by the scheduler to kill a specific task. If the scheduler has a custom executor, the
kill is forwarded to the executor and it is up to the executor to kill the task and send a
`TASK_KILLED` (or `TASK_FAILED`) update. Mesos releases the resources for a task once it receives
a terminal update for the task. If the task is unknown to the master, a `TASK_LOST` will be
generated.
+
+```
+KILL Request (JSON):
+POST /api/v1/scheduler  HTTP/1.1
+
+Host: masterhost:5050
+Content-Type: application/json
+
+{
+  “framework_id”	: {“value” : “12220-3440-12532-2345”},
+  “type”			: “KILL”,
+  “kill”			: {
+    “task_id”	:  {“value” : “12220-3440-12532-my-task”},
+    “slave_id”	:  {“value” : “12220-3440-12532-S1233”}
+  }
+}
+
+KILL Response:
+HTTP/1.1 202 Accepted
+
+```
+
+### SHUTDOWN
+Sent by the scheduler to shutdown a specific custom executor (NOTE: This is a new call that
was not present in the old API). When an executor gets a shutdown event, it is expected to
kill all its tasks (and send `TASK_KILLED` updates) and terminate. If an executor doesn’t
terminate within a certain timeout (configurable via  “--executor_shutdown_grace_period”
slave flag), the slave will forcefully destroy the container (executor and its tasks) and
transitions its active tasks to `TASK_LOST`.
+
+```
+SHUTDOWN Request (JSON):
+POST /api/v1/scheduler  HTTP/1.1
+
+Host: masterhost:5050
+Content-Type: application/json
+
+{
+  “framework_id”	: {“value” : “12220-3440-12532-2345”},
+  “type”			: “SHUTDOWN”,
+  “shutdown”		: {
+    “executor_id”	:  {“value” : “123450-2340-1232-my-executor”},
+    “slave_id”		:  {“value” : “12220-3440-12532-S1233”}
+  }
+}
+
+SHUTDOWN Response:
+HTTP/1.1 202 Accepted
+
+```
+
+### ACKNOWLEDGE
+Sent by the scheduler to acknowledge a status update. Note that with the new API, schedulers
are responsible for explicitly acknowledging the receipt of status updates that have “status.uuid()”
set. These status updates are reliably retried until they are acknowledged by the scheduler.
The scheduler must not acknowledge status updates that do not have “status.uuid()” set
as they are not retried. "uuid" is raw bytes encoded in Base64.
+
+```
+ACKNOWLEDGE Request (JSON):
+POST /api/v1/scheduler  HTTP/1.1
+
+Host: masterhost:5050
+Content-Type: application/json
+
+{
+  “framework_id”	: {“value” : “12220-3440-12532-2345”},
+  “type”			: “ACKNOWLEDGE”,
+  “acknowledge”		: {
+    “slave_id”	:  {“value” : “12220-3440-12532-S1233”},
+    “task_id”	:  {“value” : “12220-3440-12532-my-task”},
+    “uuid”		:  “jhadf73jhakdlfha723adf”
+  }
+}
+
+ACKNOWLEDGE Response:
+HTTP/1.1 202 Accepted
+
+```
+
+### RECONCILE
+Sent by the scheduler to query the status of non-terminal tasks. This causes the master to
send back `UPDATE` events for each task in the list. Tasks that are no longer known to Mesos
will result in `TASK_LOST` updates. If the list of tasks is empty, master will send `UPDATE`
events for all currently known tasks of the framework.
+
+```
+RECONCILE Request (JSON):
+POST /api/v1/scheduler   HTTP/1.1
+
+Host: masterhost:5050
+Content-Type: application/json
+
+{
+  “framework_id”	: {“value” : “12220-3440-12532-2345”},
+  “type”			: “RECONCILE”,
+  “reconcile”		: {
+    “tasks”		: [
+                   { “task_id”  : { “value” : “312325” },
+                     “slave_id” : { “value” : “123535” }
+                   }
+                  ]
+  }
+}
+
+RECONCILE Response:
+HTTP/1.1 202 Accepted
+
+```
+
+### MESSAGE
+Sent by the scheduler to send arbitrary binary data to the executor. Note that Mesos neither
interprets this data nor makes any guarantees about the delivery of this message to the executor.
"data" is raw bytes encoded in Base64.
+
+```
+MESSAGE Request (JSON):
+POST /api/v1/scheduler   HTTP/1.1
+
+Host: masterhost:5050
+Content-Type: application/json
+
+{
+  “framework_id”	: {“value” : “12220-3440-12532-2345”},
+  “type”			: “MESSAGE”,
+  “message”			: {
+    “slave_id”       : {“value” : “12220-3440-12532-S1233”},
+    “executor_id”    : {“value” : “my-framework-executor”},
+    “data”           : “adaf838jahd748jnaldf”
+  }
+}
+
+MESSAGE Response:
+HTTP/1.1 202 Accepted
+
+```
+
+### REQUEST
+Sent by the scheduler to request resources from the master/allocator. The built-in hierarchical
allocator simply ignores this request but other allocators (modules) can interpret this in
a customizable fashion.
+
+```
+Request (JSON):
+POST /api/v1/scheduler   HTTP/1.1
+
+Host: masterhost:5050
+Content-Type: application/json
+
+{
+  “framework_id”	: {“value” : “12220-3440-12532-2345”},
+  “type”			: “REQUEST”,
+  “requests”		: [
+      {
+         “slave_id”       : {“value” : “12220-3440-12532-S1233”},
+         “resources”      : {}
+      },
+  ]
+}
+
+REQUEST Response:
+HTTP/1.1 202 Accepted
+
+```
+
+## Events
+
+Scheduler is expected to keep a **persistent** connection open to “/scheduler” endpoint
even after getting a SUBSCRIBED HTTP Response event. This is indicated by “Connection: keep-alive”
and “Transfer-Encoding: chunked” headers with *no* “Content-Length” header set. All
subsequent events that are relevant to this framework  generated by Mesos are streamed on
this connection. Master encodes each Event in RecordIO format, i.e., string representation
of length of the event in bytes followed by JSON or binary Protobuf  (possibly compressed)
encoded event. Note that the value of length will never be ‘0’ and the size of the length
will be the size of unsigned integer (i.e., 64 bits). Also, note that the RecordIO encoding
should be decoded by the scheduler whereas the underlying HTTP chunked encoding is typically
invisible at the application (scheduler) layer. The type of content encoding used for the
events will be determined by the accept header of the POST request (e
 .g., Accept: application/json).
+
+The following events are currently sent by the master. The canonical source of this information
is at [scheduler.proto](include/mesos/v1/scheduler/scheduler.proto). Note that when sending
JSON encoded events, master encodes raw bytes in Base64 and strings in UTF-8.
+
+### SUBSCRIBED
+The first event sent by the master when the scheduler sends a `SUBSCRIBE` request on the
persistent connection. See `SUBSCRIBE` in Calls section for the format.
+
+
+### OFFERS
+Sent by the master whenever there are new resources that can be offered to the framework.
Each offer corresponds to a set of resources on a slave. Until the scheduler 'Accept's or
'Decline's an offer the resources are considered allocated to the scheduler, unless the offer
is otherwise rescinded, e.g. due to a lost slave or `--offer_timeout`.
+
+```
+OFFERS Event (JSON)
+<event-length>
+{
+  “type”	: “OFFERS”,
+  “offers”	: [
+    {
+      “offer_id”:{“value”: “12214-23523-O235235”},
+      “framework_id”:{“value”: “12124-235325-32425”},
+      “slave_id”:{“value”: “12325-23523-S23523”},
+      “hostname”:“slave.host”,
+      “resources”:[...],
+      “attributes”:[...],
+      “executor_ids”:[]
+    }
+  ]
+}
+```
+
+### RESCIND
+Sent by the master when a particular offer is no longer valid (e.g., the slave corresponding
to the offer has been removed) and hence needs to be rescinded. Any future calls (`ACCEPT`
/ `DECLINE`) made by the scheduler regarding this offer will be invalid.
+
+```
+RESCIND Event (JSON)
+<event-length>
+{
+  “type”	: “RESCIND”,
+  “rescind”	: {
+    “offer_id”	: { “value” : “12214-23523-O235235”}
+  }
+}
+```
+
+### UPDATE
+Sent by the master whenever there is a status update that is generated by the executor, slave
or master. Status updates should be used by executors to reliably communicate the status of
the tasks that they manage. It is crucial that a terminal update (e.g., `TASK_FINISHED`, `TASK_KILLED`,
`TASK_FAILED`) is sent by the executor as soon as the task terminates, in order for Mesos
to release the resources allocated to the task. It is also the responsibility of the scheduler
to explicitly acknowledge the receipt of status updates that are reliably retried. See `ACKNOWLEDGE`
in the Calls section above for the semantics. Note that `uuid` and `data` are raw bytes encoded
in Base64.
+
+
+```
+UPDATE Event (JSON)
+
+<event-length>
+{
+  “type”	: “UPDATE”,
+  “update”	: {
+    “status”	: {
+        “task_id”	: { “value” : “12344-my-task”},
+        “state”		: “TASK_RUNNING”,
+        “source”	: “SOURCE_EXECUTOR”,
+        “uuid”		: “adfadfadbhgvjayd23r2uahj”,
+        "bytes"		: "uhdjfhuagdj63d7hadkf"
+
+      }
+  }
+}
+```
+
+### MESSAGE
+A custom message generated by the executor that is forwarded to the scheduler by the master.
Note that this message is not interpreted by Mesos and is only forwarded (without reliability
guarantees) to the scheduler. It is up to the executor to retry if the message is dropped
 for any reason. Note that `data` is raw bytes encoded as Base64.
+
+```
+MESSAGE Event (JSON)
+
+<event-length>
+{
+  “type”	: “MESSAGE”,
+  “message”	: {
+    “slave_id”		: { “value” : “12214-23523-S235235”},
+    “executor_id”	: { “value” : “12214-23523-my-executor”},
+    “data”			: “adfadf3t2wa3353dfadf”
+  }
+}
+```
+
+
+### FAILURE
+Sent by the master when a slave is removed from the cluster (e.g., failed health checks)
or when an executor is terminated. Note that, this event coincides with receipt of terminal
`UPDATE` events for any active tasks belonging to the slave or executor and receipt of `RESCIND`
events for any outstanding offers belonging to the slave. Note that there is no guaranteed
order between the `FAILURE`, `UPDATE` and `RESCIND` events.
+
+```
+FAILURE Event (JSON)
+
+<event-length>
+{
+  “type”	: “FAILURE”,
+  “failure”	: {
+    “slave_id”		: { “value” : “12214-23523-S235235”},
+    “executor_id”	: { “value” : “12214-23523-my-executor”},
+    “status”		: 1
+  }
+}
+```
+
+### ERROR
+Sent by the master when an asynchronous error event is generated (e.g., a framework is not
authorized to subscribe with the given role). It is recommended that the framework abort when
it receives an error and retry subscription as necessary.
+
+```
+ERROR Event (JSON)
+
+<event-length>
+{
+  “type”	: “ERROR”,
+  “message”	: “Framework is not authorized”
+}
+```
+
+### HEARTBEAT
+This event is periodically sent by the master to inform the scheduler that a connection is
alive. This also helps ensure that network intermediates do not close the persistent subscription
connection due to lack of data flow. See the next section on how a scheduler can use this
event to deal with network partitions.
+
+```
+HEARTBEAT Event (JSON)
+
+<event-length>
+{
+  “type”	: “HEARTBEAT”,
+}
+```
+
+## Disconnections
+
+Master considers a scheduler disconnected if the persistent subscription connection (opened
via `SUBSCRIBE` request) to “/scheduler” breaks. The connection could break for several
reasons, e.g., scheduler restart, scheduler failover, network error. Note that the master
doesn’t keep track of non-subscription connection(s) to
+“/scheduler” because it is not expected to be a persistent connection.
+
+If master realizes that the subscription connection is broken, it marks the scheduler as
“disconnected” and starts a failover timeout (failover timeout is part of FrameworkInfo).
It also drops any pending events in its queue. Additionally, it rejects subsequent non-subscribe
HTTP requests to “/scheduler” with “403 Forbidden”, until the scheduler subscribes
again with “/scheduler”. If the scheduler *does not* re-subscribe within the failover
timeout, the master considers the scheduler gone forever and shuts down all its executors,
thus killing all its tasks. Therefore, all production schedulers are recommended to use a
high value (e.g., 4 weeks) for the failover timeout.
+
+NOTE: To force shutdown a framework before the framework timeout elapses (e.g., during framework
development and testing), either the framework can send `TEARDOWN` call (part of Scheduler
API) or an operator can use the “/master/teardown” endpoint (part of Operator API).
+
+If the scheduler realizes that its subscription connection to “/scheduler” is broken,
it should attempt to open a new persistent connection to the
+“/scheduler” (on possibly new master based on the result of master detection) and resubscribe.
It should not send new non-subscribe HTTP requests to “/scheduler” unless it gets a `SUBSCRIBED`
event; such requests will result in “403 Forbidden”.
+
+If the master does not realize that the subscription connection is broken, but the scheduler
realizes it, the scheduler might open a new persistent connection to
+“/scheduler” via `SUBSCRIBE`. In this case, the semantics depend on the value of `subscribe.force`.
If set to true, master closes the existing subscription connection and allows subscription
on the new connection. If set to false, the new connection attempt is disallowed in favor
of the existing connection. The invariant here is that, only one persistent subscription connection
for a given FrameworkID is allowed on the master. For HA schedulers, it is recommended that
a scheduler instance set `subscribe.force` to true only when it just got elected and set it
to false for all subsequent reconnection attempts (e.g, due to disconnection or master failover).
+
+### Network partitions
+
+In the case of a network partition, the subscription connection between the scheduler and
master might not necessarily break. To be able to detect this scenario, master periodically
(e.g., 15s) sends `HEARTBEAT` events (similar in vein to Twitter’s Streaming API). If a
scheduler doesn’t receive a bunch (e.g., 5) of these heartbeats within a time window, it
should immediately disconnect and try to re-subscribe. It is highly recommended for schedulers
to use an exponential backoff strategy (e.g., upto a maximum of 15s) to avoid overwhelming
the master while reconnecting. Schedulers can use a similar timeout (e.g., 75s) for receiving
responses to any HTTP requests.
+
+## Master detection
+
+Mesos has a high-availability mode that uses multiple Mesos masters; one active master (called
the leader or leading master) and several standbys in case it fails. The masters elect the
leader, with ZooKeeper coordinating the election. For more details please refer to the [documentation](high-availability.md).
+
+Schedulers are expected to make HTTP requests to the leading master. If requests are made
to a non-leading master a “HTTP 307 Temporary Redirect” will be received with the “Location”
header pointing to the leading master.
+
+Example subscription workflow with redirection when the scheduler hits a non-leading master.
+
+```
+Scheduler → Master
+POST /api/v1/scheduler  HTTP/1.1
+
+Host: masterhost1:5050
+Content-Type: application/json
+Accept: application/json
+Connection: keep-alive
+
+{
+  “framework_info”	: {
+    “user” :  “foo”,
+    “name” :  “Example HTTP Framework”
+  },
+  “type”			: “SUBSCRIBE”
+}
+
+Master → Scheduler
+HTTP/1.1 307 Temporary Redirect
+Location: masterhost2:5050
+
+
+Scheduler → Master
+POST /api/v1/scheduler  HTTP/1.1
+
+Host: masterhost2:5050
+Content-Type: application/json
+Accept: application/json
+Connection: keep-alive
+
+{
+  “framework_info”	: {
+    “user” :  “foo”,
+    “name” :  “Example HTTP Framework”
+  },
+  “type”			: “SUBSCRIBE”
+}
+```
+
+If the scheduler knows the list of master’s hostnames for a cluster, it could use this
mechanism to find the leading master to subscribe with. Alternatively, the scheduler could
use a pure language library that detects the leading master given a ZooKeeper (or etcd) URL.
For a `C++` library that does ZooKeeper based master detection please look at `src/scheduler/scheduler.cpp`.


Mime
View raw message