hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sanjay Radia (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-9671) Improve Hadoop security - master jira
Date Wed, 26 Jun 2013 01:16:20 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-9671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13693563#comment-13693563
] 

Sanjay Radia commented on HADOOP-9671:
--------------------------------------

Here is an initial draft of hadoop security usage scenarios, threat model and problems that
we would like to address.


*Hadoop Deployment Usage Scenarios*

The use cases below have two variations: with and without perimeter security (such as Knox).

* U1 Hadoop insecure deployment (ie using UGI based “authentication”)
* U2 Hadoop deployment in Active Directory (Kerberos,LDAP) authentication
* U3 Hadoop deployment with Kerberos authentication
* U4 Hadoop deployment in LDAP only shop
* U5 Hadoop deployment in public Cloud (e.g. AWS, Azure, Rackspace)
* U6 Multiple Hadoop clusters in a single organization each with different authentication
requirements and potentially different IdPs for each.


*Security Threat Model for Hadoop*
(This list is an extension of the list published in http://hortonworks.com/wp-content/uploads/2011/10/security-design_withCover-1.pdf
# An unauthorized client may access an HDFS file via the RPC or via HTTP protocols.
# A unauthorized client may read/write a data block of a file at a DataNode via the pipeline
streaming data-transfer protocol
# A unauthorized user may submit a job to a queue or delete or change priority of the job.
# A unauthorized client may access intermediate data of Map job via its task trackers HTTP
shuffle protocol.
# An executing task may use the host operating system interfaces to access other tasks, access
local data which include intermediate Map output or the local storage of the DataNode that
runs on the same physical node.
# A task may masquerade as a Hadoop service component such as a DataNode, NameNode, job tracker,
task tracker etc.
# A user may submit a workflow to Oozie as another user.
# A service may attempt to impersonate a user by using the client-presented service access
token
# A service may attempt to impersonate another service by using the service-presented service
access token (when a service is acting as a client of another)
# A user may attempt to register as a service through service registration endpoints (is this
the same as 6?

*Hadoop Security Problems*
# Perimeter security solution - Knox addresses this
# Remove the need to create Unix accounts on each compute node - (note Unix accounts are merely
for isolation and not for authentication.) Linux containers have the potential to fix this.
# Remove the need for root startup for Datanodes (HDFS-2856)
# Server authentication setup is painful - i.e. installing Keytabs for each server. Simpler
solution for Server-server mutual authentication (e.g. NN-DN) and client-server mutual authentication.
# Authentication for customers with only LDAP (Both SSO jiras. HADOOP-9392 and HADOOP-9533,
are addressing these )
# Hadoop authentication should include group membership so that group membership checking
is not needed later. Note this critical for Cloud deplyment where Security for public cloud
deployment it is not practical to call back from Cloud to the customer’s environment to
get group membership. (Both SSO jiras. HADOOP-9392 and HADOOP-9533, are addressing these ).
Related to problem 12.
# Remove the shared secret between NN and DN (potentially extensions to the SSO jiras)
# Remove the need for NN and JT delegation tokens (potentially extensions to the SSO jiras)
# Encryption on communication pipes - verify configurations and test
# Encryption on data. One solution is to use OS level encryption- someone needs to verify
and test this.
# Add ACLs to HDFS
# Change Hadoop tokens to include group membership - see the Azure use case U4 above.  Hadoop
token need to support arbitrary attributes for ABAC.
# Implementation improvements and bugs
** Change Hadoop security impl so that UGI (ie non-secure hadoop deployment) uses delegation
tokens and block access tokens. (HADOOP-8779)
** Change the implementation of Hadoop rpc security to make the authentication pluggable -
note that architecturally Hadoop rpc authentication is pluggable but the code has UGI and
Kerberos too burnt in.
# Provide the ability to identify poorly or maliciously behaving applications - independently
from applications from the same user that may be behaving properly. Note this is not a security
issue per-say but we lack a applicaiton/job identity that could be used to throttle a misbehaving
application. The hadoop job/hdfs delegation token could be used for that purpose - is this
reasonable use for it? 


  
                
> Improve Hadoop security - master jira
> -------------------------------------
>
>                 Key: HADOOP-9671
>                 URL: https://issues.apache.org/jira/browse/HADOOP-9671
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Sanjay Radia
>            Assignee: Sanjay Radia
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message