hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Moist (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HADOOP-15006) Encrypt S3A data client-side with Hadoop libraries & Hadoop KMS
Date Wed, 03 Jan 2018 19:03:00 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-15006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16310101#comment-16310101
] 

Steve Moist edited comment on HADOOP-15006 at 1/3/18 7:02 PM:
--------------------------------------------------------------

{quote}
Before worrying about these, why not conduct some experiments? You could take S3A and modify
it to always encrypt client side with the same key, then run as many integration tests as
you can against it (Hive, Spark, impala, ...), and see what fails. I think that should be
a first step to anything client-side related
{quote}

I wrote a simple proof of concept back in May using the HDFS Crytro Streams wrapping the S3
streams with a fixed AES key and IV.  I was able to run the S3 integration tests without issue,
terragen/sort/verify without issue and write various files (of differing sizes) and compare
the check sums.  It's given me enough confidence to move forward back then with writing the
original proposal.  Unfortunately, I've seemed to misplaced the work since it's been so long.
 I'll work on re-creating it in the next few weeks and post it here; I've got a deadline I've
got to focus on for now instead.  So after that I'll run some Hive/Impala/etc integration
tests.  Besides AES/CTR/NoPadding should generate a cipher text the same size as the plain
text unlike the AWS's SDK's AES/CBC/PKCS5Padding which is causing the file size to change.


was (Author: moist):
{quote}
Before worrying about these, why not conduct some experiments? You could take S3A and modify
it to always encrypt client side with the same key, then run as many integration tests as
you can against it (Hive, Spark, impala, ...), and see what fails. I think that should be
a first step to anything client-side related
{quote}

I wrote a simple proof of concept back in May using the HDFS Crytro Streams wrapping the S3
streams with a fixed AES key and IV.  I was able to run the S3 integration tests without issue,
terragen/sort/verify without issue and write various files (of differing sizes) and compare
the check sums.  It's given me enough confidence to move forward back then with writing the
original proposal.  Unfortunately, I've seemed to misplaced the work since it's been so long.
 I'll work on re-creating it in the next few weeks and post it here; I've got a deadline I've
got to focus on for now instead.  Besides AES/CTR/NoPadding should generate a cipher text
the same size as the plain text unlike the AWS's SDK's AES/CBC/PKCS5Padding which is causing
the file size to change.

> Encrypt S3A data client-side with Hadoop libraries & Hadoop KMS
> ---------------------------------------------------------------
>
>                 Key: HADOOP-15006
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15006
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: fs/s3, kms
>            Reporter: Steve Moist
>            Priority: Minor
>         Attachments: S3-CSE Proposal.pdf
>
>
> This is for the proposal to introduce Client Side Encryption to S3 in such a way that
it can leverage HDFS transparent encryption, use the Hadoop KMS to manage keys, use the `hdfs
crypto` command line tools to manage encryption zones in the cloud, and enable distcp to copy
from HDFS to S3 (and vice-versa) with data still encrypted.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message