hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ivan Mitic (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-2198) Remove the need to run NodeManager as privileged account for Windows Secure Container Executor
Date Sat, 09 Aug 2014 20:50:12 GMT

    [ https://issues.apache.org/jira/browse/YARN-2198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14091901#comment-14091901

Ivan Mitic commented on YARN-2198:

Thanks again Remus for working on this big and important piece of work. I went ~70% thru the
patch and below is my first pass of comments. Will review the rest in a day or two.

1. nativeio.c: {code}
#ifdef UNIX
    THROW(env, "java/io/IOException",
      "The function createTaskAsUser is not supported on Unix");
    return -1;
Should we return null here?
2. {code}
  LPCWSTR lpszCwd = NULL, lpszJobName = NULL, 
    lpszUser = NULL, lpszPidFile = NULL, lpszCmdLine = NULL;
Nit: nativeio code uses different naming convention for local variables. Please try to be
consistent with the rest of the file.
3. nativeio.c: {code}

  if (lpszCwd)     (*env)->ReleaseStringChars(env, cwd, lpszCwd);
  if (lpszJobName) (*env)->ReleaseStringChars(env, jobName, lpszJobName);
  if (lpszUser)    (*env)->ReleaseStringChars(env, user, lpszUser);
  if (lpszPidFile) (*env)->ReleaseStringChars(env, pidFile, lpszPidFile);
  if (lpszCmdLine) (*env)->ReleaseStringChars(env, cmdLine, lpszCmdLine);

  if (dwError != ERROR_SUCCESS) {
    throw_ioe (env, dwError);
Nit: I would move {{throw_ioe}} if check before done:, the code flow will be less error prone
4. winutils_process_stub.c: Can {{env->NewGlobalRef())) return null/throw? Should we handle
5. winutils_process_stub.c: You should properly handle the {{GetExitCodeProcess()}} failure
6. winutils_process_stub.c: {code}
  HANDLE hProcess, hThread;
7. client.c: Are RPC_STATUS error codes compatible with winerror codes? (semantic around checking
for error)
8. config.cpp: {code}
#define YARN_SITE_XML_PATH L"%HADOOP_CONF_DIR%\\yarn-site.xml"
#define YARN_DEFAULT_XML_PATH L"%HADOOP_CONF_DIR%\\yarn-default.xml"
Wondering if there is a way to get to config files without adding a dependency on env variables?
We are looking into how to to make hadoop xcopy deployable and evn variables are a pain to
handle. One way to mitigate is to try to resolve the path relatively and if that fails fall

9. config.cpp: GetConfigValue: {code}
  if (*len) {
    goto done;
This error check is unintuitive. Can you please be more explicit?
10. config.cpp: {code}
DWORD GetConfigValue(__in LPCWSTR keyName, 
  __out size_t* len, __out_bcount(len) LPCWSTR* value) {
Are SAL annotations correct? For strings one would usually use __out_ecount()?
11. config.cpp: {code}
DWORD GetConfigValueFromXmlFile(__in LPCWSTR xmlFile, __in LPCWSTR keyName, 
  __out size_t* outLen, __out_bcount(len) LPCWSTR* outValue) {
SAL annotation {{__out_bcount}}? Also outLen->len in the annotation.
11. config.cpp: GetConfigValueFromXmlFile: 
  hr = CoInitialize(NULL);
  ERROR_CHECK_HRESULT_DONE(hr, L"CoInitialize");
This should be before StringCbPrintf to guarantee that CoInit and CoUninit are balanced. 
12. hdpwinutilsvc.idl: Name does not seem appropriate for apache... possibly name it just
winutilsvc.idl. Should we use spaces in this file for consistency? 
13. winutils.h: {code}
DWORD SplitStringIgnoreSpaceW(__in size_t len, __in_bcount(len) LPCWSTR source, 
  __in WCHAR deli, 
  __out size_t* count, __out_ecount(count) WCHAR*** out);
{code} __in_bcount(len) -> __in_ecount(len)
14. libwinutils.c: SplitStirngIgnoreSpaceW: Looking at this function (and some other xml parsing
functions) I'm wondering if this is good opportunity to introduce unittests for our C code,
as the complexity started increasing beyond just windows OS calls, where there is little value
in unittesting. 
15. libwinutils.c: BuildServiceSecurityDescriptor: {code}
  eas = (EXPLICIT_ACCESS*) alloca(sizeof(EXPLICIT_ACCESS) * (grantSidCount + denySidCount));
Should we deallocate this when BuildSecurityDescriptor fails?
I don't think it is required to do this now, just wanted to bring it up: if our native codebase
continues to grow at this pace we should consider introducing smart pointers. It is becoming
impossibly hard to properly manage the memory in all success/failure cases. This becomes more
important now that we have long running NM native client and winutils service. 
16. What is the behaviour of calling {{winutils service}}. Will this command install and start
a winutils.exe service under SYSTEM account, and exit?

> Remove the need to run NodeManager as privileged account for Windows Secure Container
> ----------------------------------------------------------------------------------------------
>                 Key: YARN-2198
>                 URL: https://issues.apache.org/jira/browse/YARN-2198
>             Project: Hadoop YARN
>          Issue Type: Improvement
>            Reporter: Remus Rusanu
>            Assignee: Remus Rusanu
>              Labels: security, windows
>         Attachments: YARN-2198.1.patch, YARN-2198.2.patch
> YARN-1972 introduces a Secure Windows Container Executor. However this executor requires
a the process launching the container to be LocalSystem or a member of the a local Administrators
group. Since the process in question is the NodeManager, the requirement translates to the
entire NM to run as a privileged account, a very large surface area to review and protect.
> This proposal is to move the privileged operations into a dedicated NT service. The NM
can run as a low privilege account and communicate with the privileged NT service when it
needs to launch a container. This would reduce the surface exposed to the high privileges.

> There has to exist a secure, authenticated and authorized channel of communication between
the NM and the privileged NT service. Possible alternatives are a new TCP endpoint, Java RPC
etc. My proposal though would be to use Windows LPC (Local Procedure Calls), which is a Windows
platform specific inter-process communication channel that satisfies all requirements and
is easy to deploy. The privileged NT service would register and listen on an LPC port (NtCreatePort,
NtListenPort). The NM would use JNI to interop with libwinutils which would host the LPC client
code. The client would connect to the LPC port (NtConnectPort) and send a message requesting
a container launch (NtRequestWaitReplyPort). LPC provides authentication and the privileged
NT service can use authorization API (AuthZ) to validate the caller.

This message was sent by Atlassian JIRA

View raw message