Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 23622200B9A for ; Fri, 7 Oct 2016 17:47:24 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 226C4160AC6; Fri, 7 Oct 2016 15:47:24 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 33913160AE8 for ; Fri, 7 Oct 2016 17:47:23 +0200 (CEST) Received: (qmail 68985 invoked by uid 500); 7 Oct 2016 15:47:22 -0000 Mailing-List: contact commits-help@apex.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@apex.apache.org Delivered-To: mailing list commits@apex.apache.org Received: (qmail 68970 invoked by uid 99); 7 Oct 2016 15:47:22 -0000 Received: from git1-us-west.apache.org (HELO git1-us-west.apache.org) (140.211.11.23) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 07 Oct 2016 15:47:22 +0000 Received: by git1-us-west.apache.org (ASF Mail Server at git1-us-west.apache.org, from userid 33) id E2C8DDFB81; Fri, 7 Oct 2016 15:47:21 +0000 (UTC) Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit From: vrozov@apache.org To: commits@apex.apache.org Message-Id: <0f8e0177b42947c4bff3b77f343a4a52@git.apache.org> X-Mailer: ASF-Git Admin Mailer Subject: apex-core git commit: Documentation for CLI support for web service authentication for Kerberos SPNEGO, BASIC and DIGEST mechanisms Date: Fri, 7 Oct 2016 15:47:21 +0000 (UTC) archived-at: Fri, 07 Oct 2016 15:47:24 -0000 Repository: apex-core Updated Branches: refs/heads/master 0bdf771f8 -> a490ee04d Documentation for CLI support for web service authentication for Kerberos SPNEGO, BASIC and DIGEST mechanisms Project: http://git-wip-us.apache.org/repos/asf/apex-core/repo Commit: http://git-wip-us.apache.org/repos/asf/apex-core/commit/a490ee04 Tree: http://git-wip-us.apache.org/repos/asf/apex-core/tree/a490ee04 Diff: http://git-wip-us.apache.org/repos/asf/apex-core/diff/a490ee04 Branch: refs/heads/master Commit: a490ee04d028a4d8a6285ab75e13d663b1d671b7 Parents: 0bdf771 Author: Pramod Immaneni Authored: Wed Oct 5 13:55:56 2016 -0700 Committer: Pramod Immaneni Committed: Wed Oct 5 13:55:56 2016 -0700 ---------------------------------------------------------------------- docs/security.md | 40 ++++++++++++++++++++++++++++++++++++++-- 1 file changed, 38 insertions(+), 2 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/apex-core/blob/a490ee04/docs/security.md ---------------------------------------------------------------------- diff --git a/docs/security.md b/docs/security.md index fb4a486..6b1b8b6 100644 --- a/docs/security.md +++ b/docs/security.md @@ -15,7 +15,7 @@ The Apex command line interface (CLI) program, `apex`, is used to launch applica ###CLI Configuration -  When Kerberos security is enabled in Hadoop, a Kerberos ticket granting ticket (TGT) or the Kerberos credentials of the user are needed by the CLI program `apex` to authenticate with Hadoop for any operation. Kerberos credentials are composed of a principal and either a _keytab_ or a password. For security and operational reasons only keytabs are supported in Hadoop and by extension in Apex platform. When user credentials are specified, all operations including launching application are performed as that user. +When Kerberos security is enabled in Hadoop, a Kerberos ticket granting ticket (TGT) or the Kerberos credentials of the user are needed by the CLI program `apex` to authenticate with Hadoop for any operation. Kerberos credentials are composed of a principal and either a _keytab_ or a password. For security and operational reasons only keytabs are supported in Hadoop and by extension in Apex platform. When user credentials are specified, all operations including launching application are performed as that user. #### Using kinit @@ -49,7 +49,7 @@ The property `dt.authentication.principal` specifies the Kerberos user principal ### Web Services security -Alongside every Apex application is an application master process running called Streaming Container Manager (STRAM). STRAM manages the application by handling the various control aspects of the application such as orchestrating the execution of the application on the cluster, playing a key role in scalability and fault tolerance, providing application insight by collecting statistics among other functionality. +Alongside every Apex application, there is an application master process called Streaming Container Manager (STRAM) running. STRAM manages the application by handling the various control aspects of the application such as orchestrating the execution of the application on the cluster, playing a key role in scalability and fault tolerance, providing application insight by collecting statistics among other functionality. STRAM provides a web service interface to introspect the state of the application and its various components and to make dynamic changes to the applications. Some examples of supported functionality are getting resource usage and partition information of various operators, getting operator statistics and changing properties of running operators. @@ -75,6 +75,42 @@ The security option value can be `ENABLED`, `FOLLOW_HADOOP_AUTH`, `FOLLOW_HADOOP The subsequent sections talk about how security works in Apex. This information is not needed by users but is intended for the inquisitive techical audience who want to know how security works. +#### CLI setup + +The CLI program `apex` connects to the web service endpoint of the STRAM for a running application to query for information or to make changes to it. In order to do that, it has to first connect to the YARN proxy web service and get the necessary connection information and credentials to connect to STRAM. The proxy web service may have security enabled and in that case, the CLI program `apex` would first need to authenticate with the service before it can get any information. + +Hadoop allows a lot of flexibility in the kind of security to use for the proxy. It allows the user to plug-in their own authentication provider. The authentication provider is specified as a JAVA class name. It also comes bundled with a provider for Kerberos SPNEGO authentication. Some distributions also include a provider for BASIC authentication via SASL. + +The CLI `apex`, has built-in functionality for Kerberos SPNEGO, BASIC and DIGEST authentication mechanisms. Because of the way the authentication provider is configured for the proxy on the Hadoop side, there is no reliable way to determine before hand what kind of authentication is being used. Only at runtime, when the CLI connects to the proxy web service will it know the type of authentication that the service is using. For this reason, `apex` allows the user to configure credentials for multiple authentication mechanisms it supports and will pick the one that matches what the service expects. + +If the authentication mechanism is Kerberos SPNEGO, the properties listed in the [Using Kerberos credentials](#using-kerberos-credentials) section for general communication with Hadoop above are sufficient. No additional properties are needed. + +For BASIC authentication, the credentials can be specified using the following properties + +```xml + + dt.authentication.basic.username + username + + + dt.authentication.basic.password + password + +``` + +For DIGEST authentication, the credentials can be specified using the following properties + +```xml + + dt.authentication.digest.username + username + + + dt.authentication.digest.password + password + +``` + ### Token Refresh Apex applications, at runtime, use delegation tokens to authenticate with Hadoop services when communicating with them as described in the security architecture section below. The delegation tokens are originally issued by these Hadoop services and have an expiry time period which is typically 7 days. The tokens become invalid beyond this time and the applications will no longer be able to communicate with the Hadoop services. For long running applications this presents a problem.