hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alejandro Abdelnur (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-4417) add support for encrypted shuffle
Date Fri, 13 Jul 2012 05:13:35 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-4417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13413498#comment-13413498
] 

Alejandro Abdelnur commented on MAPREDUCE-4417:
-----------------------------------------------

When looking at encryption on the wire for the shuffle the alternatives that popped up where
transport encryption (HTTPS) and data/spills encryption (doable via a codec).

Using HTTPS requires improving the Fetcher/ShuffleHandler (Netty/JDK-URL) to use HTTPS and
configuring certificates. It is a well understood/standard/proven technology and gives you
end to end confidentiality, integrity, server authentication (and optionally client authentication),
in an out of box manner without room to get things wrong. The server certificates private
keys are out of reach from job tasks (they are used by the NM, similar to Kerberos keytabs).


Using a codec, requires (leveraging a existing plugin point) a compression codec implementation
that adds cipher-streams wrappers to the original streams and in addition could delegate to
a real compression codec (in order not to lose compression if doing encryption). This requires
us choosing a Cipher implementation by hand (which I'm not an expert on) and I'm not sure
which one would be the best choice and what are the weaknesses of each one of them (http://en.wikipedia.org/wiki/Stream_cipher#Comparison_Of_Stream_Ciphers).
Using a cipher on its own will provide confidentiality but it would not provide integrity
or man-in-the-middle protection (unless we end up implementing something like TLS). In addition,
both ends are controlled by job tasks, thus it becomes the responsibility of the user to create/distribute/protect
the secrets that are basis of confidentiality. In addition, with the codec approach the HTTP
shuffle requests/response headers go in the clear which could enable a man-in-the-middle attach.

                
> add support for encrypted shuffle
> ---------------------------------
>
>                 Key: MAPREDUCE-4417
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4417
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: mrv2, security
>    Affects Versions: 2.0.0-alpha
>            Reporter: Alejandro Abdelnur
>            Assignee: Alejandro Abdelnur
>             Fix For: 2.0.1-alpha
>
>
> Currently Shuffle fetches go on the clear. While Kerberos provides comprehensive authentication
for the cluster, it does not provide confidentiality. 
> When processing sensitive data confidentiality may be desired (at the expense of job
performance and resources utilization for doing encryption).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

Mime
View raw message