hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yuqi Wang (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HADOOP-15528) Deprecate ContainerLaunch#link by using FileUtil#SymLink
Date Fri, 06 Jul 2018 03:46:00 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-15528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16534409#comment-16534409
] 

Yuqi Wang edited comment on HADOOP-15528 at 7/6/18 3:45 AM:
------------------------------------------------------------

[~giovanni.fumarola], [~elgoiri], 

Some concerns from my side (please correct me if I am wrong), please take a look:
 # *Maybe Security and Resource Isolation Leak:*
 The *old behavior* is the symlink operation is *executed in the batch script*, which is executed
as a child process under some limited privileged and resource isolation environment, such
as windows job object (with windows secure container) or linux cgroups, etc. 
 However, the *new behavior* is the symlink operation is *executed by NM itself*, which is executed
as a child process under NM itself, it shares the same execution environment as NM.
 So, I worry about there may be some leak for Security, Resource Isolation, etc.
 # *Exit procedure is not straightforward and exit info is too less to debug.*
 For the PATCH implementation:
 It execute the symlink operation before container starts. If fails, it just record a "exit
XXX" in batch script instead of throw the failure to its caller. So, even if you execute symlink before
container starts, but the fail will not be propagated outside until the container starts.
 So, if I try to debug a container failure, I will see there is a sudden "exit XXX" in the batch
script without any other info for why NM add this line there.
 I hope we can make the execution and propagate exit status in the same execution environment,
instead of split them into different. The old behavior is all in batch script. But the
new behavior split them into NM and batch script.
 # *Better to have Retry:*
 For the PATCH implementation:
 A symlink error from container launch caller should be a transient error, so you will also
need to add the corresponding symlink failure exitcode into shouldCountTowardsMaxAttemptRetry.
So RM will always retry the AM container in face of symlink error.

Overall, at least for the PATCH implementation, I did not see any benefits.


was (Author: yqwang):
[~giovanni.fumarola], [~elgoiri], 

Some concerns from my side (please correct me if I am wrong), please take a look:
 # *Maybe Security and Resource Isolation Leak:***
The *old behavior* is the symlink operation is *executed in the batch script*, which is executed
as a child process under some limited privileged and resource isolation environment, such
as windows job object (with windows secure container) or linux cgroups, etc. 
However, the *new behavior* is the symlink operation is executed by NM itself, which is executed
as a child process under NM itself, it shares the same execution environment as NM.
So, I worry about there may be some leak for Security, Resource Isolation, etc.
 # *Exit procedure is not straightforward and exit info is too less to debug.*
For the PATCH implementation:
It execute the symlink operation before container starts. If fails, it just record a "exit
XXX" in batch script instead of throw the failure to its caller. So, even if you execute symlink before
container starts, but the fail will not be propagated outside until the container starts.
So, if I try to debug a container failure, I will see there is a sudden "exit XXX" in the batch
script without any other info for why NM add this line there.
I hope we can make the execution and propagate exit status in the same execution environment,
instead of split them into different. The old behavior is all in batch script. But the
new behavior split them into NM and batch script.
 # *Better to have Retry:*
For the PATCH implementation:
A symlink error from container launch caller should be a transient error, so you will also
need to add the corresponding symlink failure exitcode into shouldCountTowardsMaxAttemptRetry.
So RM will always retry the AM container in face of symlink error.

Overall, at least for the PATCH implementation, I did not see any benefits.

> Deprecate ContainerLaunch#link by using FileUtil#SymLink
> --------------------------------------------------------
>
>                 Key: HADOOP-15528
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15528
>             Project: Hadoop Common
>          Issue Type: Sub-task
>            Reporter: Giovanni Matteo Fumarola
>            Assignee: Giovanni Matteo Fumarola
>            Priority: Major
>         Attachments: HADOOP-15528-HADOOP-15461.v1.patch, HADOOP-15528-HADOOP-15461.v2.patch,
HADOOP-15528-HADOOP-15461.v3.patch
>
>
> {{ContainerLaunch}} currently uses its own utility to create links (including winutils).
> This should be deprecated and rely on {{FileUtil#SymLink}} which is already multi-platform
and pure Java.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message