Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id F338D200C0F for ; Thu, 2 Feb 2017 22:27:30 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id F1C84160B57; Thu, 2 Feb 2017 21:27:30 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id EFD59160B44 for ; Thu, 2 Feb 2017 22:27:29 +0100 (CET) Received: (qmail 90181 invoked by uid 500); 2 Feb 2017 21:27:29 -0000 Mailing-List: contact commits-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@flink.apache.org Delivered-To: mailing list commits@flink.apache.org Received: (qmail 90172 invoked by uid 99); 2 Feb 2017 21:27:29 -0000 Received: from git1-us-west.apache.org (HELO git1-us-west.apache.org) (140.211.11.23) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 02 Feb 2017 21:27:29 +0000 Received: by git1-us-west.apache.org (ASF Mail Server at git1-us-west.apache.org, from userid 33) id 1AD64DFC12; Thu, 2 Feb 2017 21:27:29 +0000 (UTC) Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit From: trohrmann@apache.org To: commits@flink.apache.org Message-Id: <95be5ebc9d7f4a259a1dcfff3f176919@git.apache.org> X-Mailer: ASF-Git Admin Mailer Subject: flink git commit: [FLINK-5494] [docs] Add more details to the Mesos documentation Date: Thu, 2 Feb 2017 21:27:29 +0000 (UTC) archived-at: Thu, 02 Feb 2017 21:27:31 -0000 Repository: flink Updated Branches: refs/heads/release-1.2 b0a95a9e5 -> 8d81bfdaa [FLINK-5494] [docs] Add more details to the Mesos documentation This closes #3236. Project: http://git-wip-us.apache.org/repos/asf/flink/repo Commit: http://git-wip-us.apache.org/repos/asf/flink/commit/8d81bfda Tree: http://git-wip-us.apache.org/repos/asf/flink/tree/8d81bfda Diff: http://git-wip-us.apache.org/repos/asf/flink/diff/8d81bfda Branch: refs/heads/release-1.2 Commit: 8d81bfdaa2cd18ca9ef84c1618fc8c94e0f0e199 Parents: b0a95a9 Author: Till Rohrmann Authored: Mon Jan 30 18:19:07 2017 +0100 Committer: Till Rohrmann Committed: Thu Feb 2 22:27:17 2017 +0100 ---------------------------------------------------------------------- docs/setup/config.md | 40 +++++++++++ docs/setup/mesos.md | 165 ++++++++++++++++++++++++++-------------------- 2 files changed, 135 insertions(+), 70 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/flink/blob/8d81bfda/docs/setup/config.md ---------------------------------------------------------------------- diff --git a/docs/setup/config.md b/docs/setup/config.md index 2c99a95..bf92f56 100644 --- a/docs/setup/config.md +++ b/docs/setup/config.md @@ -426,6 +426,46 @@ of the JobManager, because the same ActorSystem is used. Its not possible to use For example when running Flink on YARN on an environment with a restrictive firewall, this option allows specifying a range of allowed ports. +### Mesos + + +- `mesos.initial-tasks`: The initial workers to bring up when the master starts (**DEFAULT**: The number of workers specified at cluster startup). + +- `mesos.maximum-failed-tasks`: The maximum number of failed workers before the cluster fails (**DEFAULT**: Number of initial workers). +May be set to -1 to disable this feature. + +- `mesos.master`: The Mesos master URL. The value should be in one of the following forms: + * `host:port` + * `zk://host1:port1,host2:port2,.../path` + * `zk://username:password@host1:port1,host2:port2,.../path` + * `file:///path/to/file` + + +- `mesos.failover-timeout`: The failover timeout in seconds for the Mesos scheduler, after which running tasks are automatically shut down (**DEFAULT:** 600). + +- `mesos.resourcemanager.artifactserver.port`:The config parameter defining the Mesos artifact server port to use. Setting the port to 0 will let the OS choose an available port. + +- `mesos.resourcemanager.framework.name`: Mesos framework name (**DEFAULT:** Flink) + +- `mesos.resourcemanager.framework.role`: Mesos framework role definition (**DEFAULT:** *) + +- `mesos.resourcemanager.framework.principal`: Mesos framework principal (**NO DEFAULT**) + +- `mesos.resourcemanager.framework.secret`: Mesos framework secret (**NO DEFAULT**) + +- `mesos.resourcemanager.framework.user`: Mesos framework user (**DEFAULT:**"") + +- `mesos.resourcemanager.artifactserver.ssl.enabled`: Enables SSL for the Flink artifact server (**DEFAULT**: true). Note that `security.ssl.enabled` also needs to be set to `true` encryption to enable encryption. + +- `mesos.resourcemanager.tasks.mem`: Memory to assign to the Mesos workers in MB (**DEFAULT**: 1024) + +- `mesos.resourcemanager.tasks.cpus`: CPUs to assign to the Mesos workers (**DEFAULT**: 0.0) + +- `mesos.resourcemanager.tasks.container.type`: Type of the containerization used: "mesos" or "docker" (DEFAULT: mesos); + +- `mesos.resourcemanager.tasks.container.image.name`: Image name to use for the container (**NO DEFAULT**) + +- `recovery.zookeeper.path.mesos-workers`: The ZooKeeper root path for persisting the Mesos worker information. ### High Availability (HA) http://git-wip-us.apache.org/repos/asf/flink/blob/8d81bfda/docs/setup/mesos.md ---------------------------------------------------------------------- diff --git a/docs/setup/mesos.md b/docs/setup/mesos.md index b85a593..032477a 100644 --- a/docs/setup/mesos.md +++ b/docs/setup/mesos.md @@ -114,29 +114,55 @@ Now you can use this address to submit a job to your cluster via ## Mesos without DC/OS -Let's take a look at how to setup Flink on Mesos without DC/OS. +You can also run Mesos without DC/OS. -### Prerequisites +### Installing Mesos -Please follow the -[instructions on how to setup Mesos on the official website](http://mesos.apache.org/documentation/latest/). +Please follow the [instructions on how to setup Mesos on the official website](http://mesos.apache.org/documentation/latest/getting-started/). -### Optional dependencies +After installation you have to configure the set of master and agent nodes by creating the files `MESOS_HOME/etc/mesos/masters` and `MESOS_HOME/etc/mesos/slaves`. +These files contain in each row a single hostname on which the respective component will be started (assuming SSH access to these nodes). -Optionally, -you may also install [Marathon](https://mesosphere.github.io/marathon/) which -will be necessary if you want your Flink cluster to be highly available in the -presence of master node failures. Additionally, you probably want to install a -distributed file system to share data across nodes and make use of Flink's -checkpointing mechanism. +Next you have to create `MESOS_HOME/etc/mesos/mesos-master-env.sh` or use the template found in the same directory. +In this file, you have to define + + export MESOS_work_dir=WORK_DIRECTORY + +and it is recommended to uncommment + + export MESOS_log_dir=LOGGING_DIRECTORY + + +In order to configure the Mesos agents, you have to create `MESOS_HOME/etc/mesos/mesos-agent-env.sh` or use the template found in the same directory. +You have to configure + + export MESOS_master=MASTER_HOSTNAME:MASTER_PORT + +and uncomment + + export MESOS_log_dir=LOGGING_DIRECTORY + export MESOS_work_dir=WORK_DIRECTORY + +#### Mesos Library + +In order to run Java applications with Mesos you have to export `MESOS_NATIVE_JAVA_LIBRARY=MESOS_HOME/lib/libmesos.so` on Linux. +Under Mac OS X you have to export `MESOS_NATIVE_JAVA_LIBRARY=MESOS_HOME/lib/libmesos.dylib`. + +#### Deploying Mesos + +In order to start your mesos cluster, use the deployment script `MESOS_HOME/sbin/mesos-start-cluster.sh`. +In order to stop your mesos cluster, use the deployment script `MESOS_HOME/sbin/mesos-stop-cluster.sh`. +More information about the deployment scripts can be found [here](http://mesos.apache.org/documentation/latest/deploy-scripts/). + +### Installing Marathon + +Optionally, you may also [install Marathon](https://mesosphere.github.io/marathon/docs/) which will be necessary to run Flink in high availability (HA) mode. ### Pre-installing Flink vs Docker/Mesos containers -You may install Flink on all of your Mesos Master and Agent nodes. You can also -pull the binaries from the Flink web site during deployment and apply your -custom configuration before launching the application master. A more -convenient and easier to maintain approach is to use Docker containers to manage -the Flink binaries and configuration. +You may install Flink on all of your Mesos Master and Agent nodes. +You can also pull the binaries from the Flink web site during deployment and apply your custom configuration before launching the application master. +A more convenient and easier to maintain approach is to use Docker containers to manage the Flink binaries and configuration. This is controlled via the following configuration entries: @@ -152,60 +178,78 @@ If set to 'docker', specify the image name: In the `/bin` directory of the Flink distribution, you find two startup scripts which manage the Flink processes in a Mesos cluster: -1. mesos-appmaster.sh - This starts the Mesos application master which will register the Mesos - scheduler. It is also responsible for starting up the worker nodes. +1. `mesos-appmaster.sh` + This starts the Mesos application master which will register the Mesos scheduler. + It is also responsible for starting up the worker nodes. -2. mesos-taskmanager.sh - The entry point for the Mesos worker processes. You don't need to explicitly - execute this script. It is automatically launched by the Mesos worker node to - bring up a new TaskManager. +2. `mesos-taskmanager.sh` + The entry point for the Mesos worker processes. + You don't need to explicitly execute this script. + It is automatically launched by the Mesos worker node to bring up a new TaskManager. + +In order to run the `mesos-appmaster.sh` script you have to define `mesos.master` in the `flink-conf.yaml` or pass it via `-Dmesos.master=...` to the Java process. +Additionally, you should define the number of task managers which are started by Mesos via `mesos.initial-tasks`. +This value can also be defined in the `flink-conf.yaml` or passed as a Java property. + +When executing `mesos-appmaster.sh`, it will create a job manager on the machine where you executed the script. +In contrast to that, the task managers will be run as Mesos tasks in the Mesos cluster. + +#### General configuration + +It is possible to completely parameterize a Mesos application through Java properties passed to the Mesos application master. +This also allows to specify general Flink configuration parameters. +For example: + + bin/mesos-appmaster.sh \ + -Dmesos.master=master.foobar.org:5050 + -Djobmanager.heap.mb=1024 \ + -Djobmanager.rpc.port=6123 \ + -Djobmanager.web.port=8081 \ + -Dmesos.initial-tasks=10 \ + -Dmesos.resourcemanager.tasks.mem=4096 \ + -Dtaskmanager.heap.mb=3500 \ + -Dtaskmanager.numberOfTaskSlots=2 \ + -Dparallelism.default=10 ### High Availability -You will need to run a service like Marathon or Apache Aurora which takes care -of restarting the Flink master process in case of node or process failures. In -addition, Zookeeper needs to be configured like described in the -[High Availability section of the Flink docs]({{ site.baseurl }}/setup/jobmanager_high_availability.html) +You will need to run a service like Marathon or Apache Aurora which takes care of restarting the Flink master process in case of node or process failures. +In addition, Zookeeper needs to be configured like described in the [High Availability section of the Flink docs]({{ site.baseurl }}/setup/jobmanager_high_availability.html) -For the reconciliation of tasks to work correctly, please also set -`recovery.zookeeper.path.mesos-workers` to a valid Zookeeper path. +For the reconciliation of tasks to work correctly, please also set `recovery.zookeeper.path.mesos-workers` to a valid Zookeeper path. #### Marathon -Marathon needs to be set up to launch the `bin/mesos-appmaster.sh` script. In -particular, it should also adjust any configuration parameters for the Flink -cluster. +Marathon needs to be set up to launch the `bin/mesos-appmaster.sh` script. +In particular, it should also adjust any configuration parameters for the Flink cluster. Here is an example configuration for Marathon: { - "id": "basic-0", - "cmd": "$FLINK_HOME/bin/mesos-appmaster.sh -DconfigEntry=configValue -DanotherEntry=anotherValue ...", - "cpus": 1.0, - "mem": 2048, + "id": "flink", + "cmd": "$FLINK_HOME/bin/mesos-appmaster.sh -Djobmanager.heap.mb=1024 -Djobmanager.rpc.port=6123 -Djobmanager.web.port=8081 -Dmesos.initial-tasks=1 -Dmesos.resourcemanager.tasks.mem=1024 -Dtaskmanager.heap.mb=1024 -Dtaskmanager.numberOfTaskSlots=2 -Dparallelism.default=2 -Dmesos.resourcemanager.tasks.cpus=1", + "cpus": 1.0, + "mem": 1024 } -### Configuration parameters - -#### Mesos configuration entries +When running Flink with Marathon, the whole Flink cluster including the job manager will be run as Mesos tasks in the Mesos cluster. +### Configuration parameters -`mesos.initial-tasks`: The initial workers to bring up when the master - starts (**DEFAULT**: The number of workers specified at cluster startup). +`mesos.initial-tasks`: The initial workers to bring up when the master starts (**DEFAULT**: The number of workers specified at cluster startup). -`mesos.maximum-failed-tasks`: The maximum number of failed workers before - the cluster fails (**DEFAULT**: Number of initial workers) May be set to -1 - to disable this feature. +`mesos.maximum-failed-tasks`: The maximum number of failed workers before the cluster fails (**DEFAULT**: Number of initial workers). +May be set to -1 to disable this feature. -`mesos.master`: The Mesos master URL. The value should be in one of the - following forms: host:port, zk://host1:port1,host2:port2,.../path, - zk://username:password@host1:port1,host2:port2,.../path, - file:///path/to/file (where file contains one of the above) +`mesos.master`: The Mesos master URL. The value should be in one of the following forms: + +* `host:port` +* `zk://host1:port1,host2:port2,.../path` +* `zk://username:password@host1:port1,host2:port2,.../path` +* `file:///path/to/file` -`mesos.failover-timeout`: The failover timeout in seconds for the Mesos scheduler, after - which running tasks are automatically shut down (**DEFAULT:** 600). +`mesos.failover-timeout`: The failover timeout in seconds for the Mesos scheduler, after which running tasks are automatically shut down (**DEFAULT:** 600). `mesos.resourcemanager.artifactserver.port`:The config parameter defining the Mesos artifact server port to use. Setting the port to 0 will let the OS choose an available port. @@ -221,9 +265,7 @@ Here is an example configuration for Marathon: `mesos.resourcemanager.framework.user`: Mesos framework user (**DEFAULT:**"") -`mesos.resourcemanager.artifactserver.ssl.enabled`: Enables SSL for the Flink -artifact server (**DEFAULT**: true). Note that `security.ssl.enabled` also needs -to be set to `true` encryption to enable encryption. +`mesos.resourcemanager.artifactserver.ssl.enabled`: Enables SSL for the Flink artifact server (**DEFAULT**: true). Note that `security.ssl.enabled` also needs to be set to `true` encryption to enable encryption. `mesos.resourcemanager.tasks.mem`: Memory to assign to the Mesos workers in MB (**DEFAULT**: 1024) @@ -232,20 +274,3 @@ to be set to `true` encryption to enable encryption. `mesos.resourcemanager.tasks.container.type`: Type of the containerization used: "mesos" or "docker" (DEFAULT: mesos); `mesos.resourcemanager.tasks.container.image.name`: Image name to use for the container (**NO DEFAULT**) - - -#### General configuration - -It is possible to completely parameterize a Mesos application through Java -properties passed to the Mesos application master. This also allows to specify -general Flink configuration parameters. For example: - - bin/mesos-appmaster.sh \ - -Djobmanager.heap.mb=1024 \ - -Djobmanager.rpc.port=6123 \ - -Djobmanager.web.port=8081 \ - -Dmesos.initial-tasks=10 \ - -Dmesos.resourcemanager.tasks.mem=4096 \ - -Dtaskmanager.heap.mb=3500 \ - -Dtaskmanager.numberOfTaskSlots=2 \ - -Dparallelism.default=10