Return-Path: X-Original-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D37E711026 for ; Tue, 17 Jun 2014 18:21:08 +0000 (UTC) Received: (qmail 62610 invoked by uid 500); 17 Jun 2014 18:21:08 -0000 Delivered-To: apmail-hadoop-yarn-issues-archive@hadoop.apache.org Received: (qmail 62570 invoked by uid 500); 17 Jun 2014 18:21:08 -0000 Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: yarn-issues@hadoop.apache.org Delivered-To: mailing list yarn-issues@hadoop.apache.org Received: (qmail 62559 invoked by uid 99); 17 Jun 2014 18:21:08 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 17 Jun 2014 18:21:08 +0000 Date: Tue, 17 Jun 2014 18:21:08 +0000 (UTC) From: "Vinod Kumar Vavilapalli (JIRA)" To: yarn-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (YARN-1972) Implement secure Windows Container Executor MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/YARN-1972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14034160#comment-14034160 ] Vinod Kumar Vavilapalli commented on YARN-1972: ----------------------------------------------- bq. All in all a very high privilege required for NM. We are considering a future iteration in which we extract the privileged operations into a dedicated NT service (=daemon) and bestow the high privileges only to this service. Thanks. Let's document this in a Windows specific docs page. bq. You are launching so many commands for every container - to chown files, to copy files etc. bq. We'll measure. [..] I don't think that moving the localization into native code would result in much benefit over a proper Java implementation. I'd file an investigation ticket to track this. bq. DCE and WCE no longer create user file cache, this is done solely by the localizer initDirs. DCE Test modified to reflect this. DCE.createUserCacheDirs renamed to createUserAppCacheDirs accordingly The division of responsibility between launching multiple commands before starting the localizer and the stuff that happens inside the localizer: Unfortunately, this still isn't ideal. Having userCache created by the ContainerExecutor but not file-cache is assymetric and confusing. I propose that we split this refactoring into a separate JIRA and stick to your original code. Apologies for the back-and-forth on this one. bq. There is more feedback to address (DRY between LCE and WCE localization launch, proper place for localization classpath jar). So, you will work on them here itself, right? Looks fine otherwise, exception for the above comments and a request for some basic documentation. > Implement secure Windows Container Executor > ------------------------------------------- > > Key: YARN-1972 > URL: https://issues.apache.org/jira/browse/YARN-1972 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager > Reporter: Remus Rusanu > Assignee: Remus Rusanu > Labels: security, windows > Attachments: YARN-1972.1.patch, YARN-1972.2.patch > > > h1. Windows Secure Container Executor (WCE) > YARN-1063 adds the necessary infrasturcture to launch a process as a domain user as a solution for the problem of having a security boundary between processes executed in YARN containers and the Hadoop services. The WCE is a container executor that leverages the winutils capabilities introduced in YARN-1063 and launches containers as an OS process running as the job submitter user. A description of the S4U infrastructure used by YARN-1063 alternatives considered can be read on that JIRA. > The WCE is based on the DefaultContainerExecutor. It relies on the DCE to drive the flow of execution, but it overwrrides some emthods to the effect of: > * change the DCE created user cache directories to be owned by the job user and by the nodemanager group. > * changes the actual container run command to use the 'createAsUser' command of winutils task instead of 'create' > * runs the localization as standalone process instead of an in-process Java method call. This in turn relies on the winutil createAsUser feature to run the localization as the job user. > > When compared to LinuxContainerExecutor (LCE), the WCE has some minor differences: > * it does no delegate the creation of the user cache directories to the native implementation. > * it does no require special handling to be able to delete user files > The approach on the WCE came from a practical trial-and-error approach. I had to iron out some issues around the Windows script shell limitations (command line length) to get it to work, the biggest issue being the huge CLASSPATH that is commonplace in Hadoop environment container executions. The job container itself is already dealing with this via a so called 'classpath jar', see HADOOP-8899 and YARN-316 for details. For the WCE localizer launch as a separate container the same issue had to be resolved and I used the same 'classpath jar' approach. > h2. Deployment Requirements > To use the WCE one needs to set the `yarn.nodemanager.container-executor.class` to `org.apache.hadoop.yarn.server.nodemanager.WindowsSecureContainerExecutor` and set the `yarn.nodemanager.windows-secure-container-executor.group` to a Windows security group name that is the nodemanager service principal is a member of (equivalent of LCE `yarn.nodemanager.linux-container-executor.group`). Unlike the LCE the WCE does not require any configuration outside of the Hadoop own's yar-site.xml. > For WCE to work the nodemanager must run as a service principal that is member of the local Administrators group or LocalSystem. this is derived from the need to invoke LoadUserProfile API which mention these requirements in the specifications. This is in addition to the SE_TCB privilege mentioned in YARN-1063, but this requirement will automatically imply that the SE_TCB privilege is held by the nodemanager. For the Linux speakers in the audience, the requirement is basically to run NM as root. > h2. Dedicated high privilege Service > Due to the high privilege required by the WCE we had discussed the need to isolate the high privilege operations into a separate process, an 'executor' service that is solely responsible to start the containers (incloding the localizer). The NM would have to authenticate, authorize and communicate with this service via an IPC mechanism and use this service to launch the containers. I still believe we'll end up deploying such a service, but the effort to onboard such a new platfrom specific new service on the project are not trivial. -- This message was sent by Atlassian JIRA (v6.2#6252)