hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Konstantin Shvachko (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (MAPREDUCE-4491) Encryption and Key Protection
Date Tue, 28 Aug 2012 06:26:09 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-4491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13442982#comment-13442982
] 

Konstantin Shvachko edited comment on MAPREDUCE-4491 at 8/28/12 5:25 PM:
-------------------------------------------------------------------------

Benoy. I went over your design document. Pretty comprehensive description. 
Want to clarify couple of things. 
# Do I understand correctly that your approach can be used to securely store (encrypt) data
even on non-secure (security=simple) clusters?
# So JobClient uses current user credentials to obtain keys from the KeyStore, encrypts them
with cluster-public-key and sends to the cluster along with the user credentials. JobTracker
has nothing to do with the keys and passes the encrypted blob over to TaskTrackers scheduled
to execute the tasks. TT decrypts the user keys using private-cluster-key and handles them
to the local tasks, which is secure as keys don't travel over the wires. Is it right so far?
# TT should be using user credentials to decrypt the blob of keys somehow? Or does it authenticate
the user and then decrypts if authentication passes? I did not find it in your document.
# How cluster-private-key is delivered to TTs?
# I think configuration parameters naming need some changes. They should not start with {{mapreduce.job}}.
Based on your examples you can just encrypt a HDFS file without spawning any actual jobs.
In this case seeing {{mapreduce.job.*}} seems confusing.
My suggestion is to prefix all parameters with simply {{hadoop.crypto.*}} Then you can use
e.g. full word "keystore" instead of "ks".

I plan to get into reviewing the implementation soon.
                
      was (Author: shv):
    Benoy. I went over your design document. Pretty comprehensive description. 
Want to clarify couple of things. 
# Do I understand correctly that your approach can be used to securely store (encrypt) data
even on non-secure (security=simple) clusters?
# So JobClient uses current user credentials to obtain keys from the KeyStore, encrypts them
with cluster-public-key and sends to the cluster along with the user credentials. JobTracker
has nothing to do with the keys and passes the encrypted blob over to TaskTrackers scheduled
to execute the tasks. TT decrypts the user keys using private-cluster-key and handles them
to the local tasks, which is secure as keys don't travel over the wires. Is it right so far?
# TT should be using user credentials to decrypt the blob of keys somehow? Or does it authenticate
the user and then decrypts if authentication passes? I did not find it in your document.
# How cluster-private-key is delivered to TTs?
# I think configuration parameters naming need some changes. They should not start with {{mapreduce.job}}.
Based on your examples you can just encrypt a HDFS file without spawning any actual jobs.
In this case seeing {{mapreduce.job.*}} seems confusing.
My suggestion is to prefix all parameters with simply {{crypto.*}} Then you can use e.g. full
word "keystore" instead of "ks".

I plan to get into reviewing the implementation soon.
                  
> Encryption and Key Protection
> -----------------------------
>
>                 Key: MAPREDUCE-4491
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4491
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: documentation, security, task-controller, tasktracker
>            Reporter: Benoy Antony
>            Assignee: Benoy Antony
>         Attachments: Hadoop_Encryption.pdf, Hadoop_Encryption.pdf
>
>
> When dealing with sensitive data, it is required to keep the data encrypted wherever
it is stored. Common use case is to pull encrypted data out of a datasource and store in HDFS
for analysis. The keys are stored in an external keystore. 
> The feature adds a customizable framework to integrate different types of keystores,
support for Java KeyStore, read keys from keystores, and transport keys from JobClient to
Tasks.
> The feature adds PGP encryption as a codec and additional utilities to perform encryption
related steps.
> The design document is attached. It explains the requirement, design and use cases.
> Kindly review and comment. Collaboration is very much welcome.
> I have a tested patch for this for 1.1 and will upload it soon as an initial work for
further refinement.
> Update: The patches are uploaded to subtasks. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message