hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benoy Antony (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-4491) Encryption and Key Protection
Date Mon, 13 Aug 2012 18:39:38 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-4491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13433401#comment-13433401
] 

Benoy Antony commented on MAPREDUCE-4491:
-----------------------------------------

One of the goals of this feature is to achieve encryption of files in transit and at rest(when
stored on disk). One way to achieve this goal is to depend on a software/hardware which allows
encryption in the local file system plus rely on HDFS-3637  and MR shuffle encryption.

This jira  explores an alternative approach to the problem without depending on s special
software to do local file system encryption. 

The key advantages of this approach over the local file system encryption approach are

1)  A file can be decrypted only if the user provides the correct key. So even if someone
managed to read the file, he cannot read its contents without key. So user's possession of
the key is required in addition to his read permission. So there are two levels of protection. 

There could be cases where a user accidentally set "read" permissions for everyone. There
could be cases where a superuser reads the file. But  this scheme protects the data.

2) No dependency on local file system encryption software.  This approach allows encryption
without such special setup.

3) A file is decrypted/encrypted only during processing and not when it is read.  So this
results in a less number of encryption/decryption.


Other key points will be :

1) Encrypted and plain text files can coexist in a normal file system. 

2) Developers can plugin other encryption algorithms/standards - CMS, AES, custom encryption
and thus have more flexibility.

3) Allows transporting keys/password/tokens  from JobClient to tasks for use cases other
than encryption like connecting to a webservice . MAPREDUCE-4491 adds keyProtection and encryption
uses it.

4) Can manage keys in one central location. JobClient  gets on behalf of user like any other
application. 

If we look at these two approaches from a higher level, we can see that one local file system
approach is an internal approach to encryption and MAPREDUCE-4491 approach is an external
approach. These two choices are available in normal (non-distributed) application development
also where developers can rely on the file system to provide encryption or do encryption themselves.
There are tradeoffs and flexibilities in the both the approaches and we choose it based on
our use cases and needs.  So I believe , we should provide  these two alternatives  in
Hadoop.

In addition, this feature allows key protection in general, which can be used for purposes
other than encryption. The keys also will be encrypted when stored on disk and decrypted only
in memory.

                
> Encryption and Key Protection
> -----------------------------
>
>                 Key: MAPREDUCE-4491
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4491
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: documentation, security, task-controller, tasktracker
>            Reporter: Benoy Antony
>            Assignee: Benoy Antony
>         Attachments: Hadoop_Encryption.pdf, Hadoop_Encryption.pdf
>
>
> When dealing with sensitive data, it is required to keep the data encrypted wherever
it is stored. Common use case is to pull encrypted data out of a datasource and store in HDFS
for analysis. The keys are stored in an external keystore. 
> The feature adds a customizable framework to integrate different types of keystores,
support for Java KeyStore, read keys from keystores, and transport keys from JobClient to
Tasks.
> The feature adds PGP encryption as a codec and additional utilities to perform encryption
related steps.
> The design document is attached. It explains the requirement, design and use cases.
> Kindly review and comment. Collaboration is very much welcome.
> I have a tested patch for this for 1.1 and will upload it soon as an initial work for
further refinement.
> Update: The patches are uploaded to subtasks. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

Mime
View raw message