hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Varun Vasudev (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-5673) [Umbrella] Re-write container-executor to improve security, extensibility, and portability
Date Wed, 07 Dec 2016 07:07:58 GMT

    [ https://issues.apache.org/jira/browse/YARN-5673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15727953#comment-15727953
] 

Varun Vasudev commented on YARN-5673:
-------------------------------------

Thanks for the feedback [~miklos.szegedi@cloudera.com]!

{quote}
What is the common functionality that all modules need and why do not we simply split them
into separate executables instead of loading them as modules to a core container-executor
process?
linux-container-executor
docker-container-executor
mount-cgroups
tc-executor

2. I also have a concern of maintenance. File system privileges setup is what system administrators
are most familiar with. We could let them use that instead of modifying proprietary configuration
files, how to locate load modules into a linux-container-executor binary. Any common functionality
like auditing, logging, setup checking can be statically linked to each executable. It also
has the advantage of setting suid bits separately. An example: /usr/bin/sudo and /usr/bin/passwd
tell in their names what they do and what you get if you set suid permission on them. An administrator
would only set it on the executables that are needed. Setting suid on the posix-container-executor
on the other hand means that it is allowed to load modules in a controlled but advanced way.
Separate executables made the life of administrators way easier I think. Each of these binaries
can have its own configuration file just like the modules you proposed.
{quote}

All of these binaries will require the setuid bit to be a set which means administrators will
have to set permissions and manage 4 binaries. We also have to worry about 4 binaries that
can have privilege escalation as opposed to one - any hot fixes for example will require all
4 binaries to be updated as opposed to just one. Interestingly you feel that administrator
overhead of managing 4 binaries is worth it whereas some folks would prefer it the other way
round :). Do other folks feel that the multiple binaries approach is the way to go?

{quote}
1. I have a concern of native modules potentially loaded into the same process. Even if communication
between modules is not allowed this is a native binary, where all native code will have access
to the whole memory. Just like the current executor, when more features are running in the
same process with admin privileges, a faulty or malicious module may cause security issues
even loading and accessing others. Even if we can add protection, the protection code would
add more complexity.
{quote}

Fair point. The idea here is that -
(1) Administrators will not add arbitrary modules to the module list.
(2) The posix-container-executor will give up all privileges before loading the modules which
don't require administrator privileges
(3) Give administrators an option to turn off modules that require administrator privileges.
Would these help mitigate your concerns? The issue with the current setup is that there is
no clean way to enable/disable functionality that administrators do not want enabled on their
cluster.

{quote}
A. One more separate issue that I wanted to ask your opinion about is the time launching a
container. First the container executor is executed, then a shell script that runs the actual
container like Java. Would not it be faster to launch container executor just once and communicate
launch commands through the standard pipe or a named pipe and keep it running as long as the
node manager is running?
{quote}
It probably would be but container launch time hasn’t been something people have complained
about. Do you have some scenarios where container launch time has been an issue? The security
aspects of a long running process versus one which is invoked on demand are different as well.

> [Umbrella] Re-write container-executor to improve security, extensibility, and portability
> ------------------------------------------------------------------------------------------
>
>                 Key: YARN-5673
>                 URL: https://issues.apache.org/jira/browse/YARN-5673
>             Project: Hadoop YARN
>          Issue Type: New Feature
>          Components: nodemanager
>            Reporter: Varun Vasudev
>            Assignee: Varun Vasudev
>         Attachments: container-executor Re-write Design Document.pdf
>
>
> As YARN adds support for new features that require administrator privileges(such as support
for network throttling and docker), we’ve had to add new capabilities to the container-executor.
This has led to a recognition that the current container-executor security features as well
as the code could be improved. The current code is fragile and it’s hard to add new features
without causing regressions. Some of the improvements that need to be made are -
> *Security*
> Currently the container-executor has limited security features. It relies primarily on
the permissions set on the binary but does little additional security beyond that. There are
few outstanding issues today -
> - No audit log
> - No way to disable features - network throttling and docker support are built in and
there’s no way to turn them off at a container-executor level
> - Code can be improved - a lot of the code switches users back and forth in an arbitrary
manner
> - No input validation - the paths, and files provided at invocation are not validated
or required to be in some specific location
> - No signing functionality - there is no way to enforce that the binary was invoked by
the NM and not by any other process
> *Code Issues*
> The code layout and implementation themselves can be improved. Some issues there are
-
> - No support for log levels - everything is logged and this can’t be turned on or off
> - Extremely long set of invocation parameters(specifically during container launch) which
makes turning features on or off complicated
> - Poor test coverage - it’s easy to introduce regressions today due to the lack of
a proper test setup
> - Duplicate functionality - there is some amount of code duplication
> - Hard to make improvements or add new features due to the issues raised above
> *Portability*
>  - The container-executor mixes platform dependent APIs with platform independent APIs
making it hard to run it on multiple platforms. Allowing it to run on multiple platforms also
improves the overall code structure .
> One option is to improve the existing container-executor, however it might be easier
to start from scratch. That allows existing functionality to be supported until we are ready
to switch to the new code.
> This umbrella JIRA is to capture all the work required for the new code. I'm going to
work on a design doc for the changes - any suggestions or improvements are welcome.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message