hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alejandro Abdelnur (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (MAPREDUCE-5890) Support for encrypting Intermediate data and spills in local filesystem
Date Tue, 01 Jul 2014 04:51:25 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-5890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14048355#comment-14048355
] 

Alejandro Abdelnur edited comment on MAPREDUCE-5890 at 7/1/14 4:50 AM:
-----------------------------------------------------------------------

[~chris.douglas],
I had initially tried to directly modify the {{IFile}} format to handle the iv. The reason
I felt this would not be such a clean solution is :
* The {{IFile}} currently does not have a notion of an explicit header/metadata.
* While it is possible to use the {{IFile.Writer}} constructor to write the IV and (thus make
it transparent to the rest of the code-base). The reading code-path is not so straight-forward.
There are two classes that extend the {{IFile.Reader}} ({{InMemoryReader}} and {{RawKVIteratorReader}}).
The {{InMemoryReader}} totally ignores the inputStream that is initialized in the base class
constructor and there are places in the codeBase that the input stream is not initialized
in the Reader but in the {{Segment::init()}} method (which in my opinion makes the {{IFile}}
abstraction a bit leaky since the underlying stream should be handled in its entirity in the
IFile Writer/Reader.. the {{Segment}} class (which is part of the {{Merger}} framework) should
avoid dealing with the internals of the ).
* Also, I was not able to do away with a lot of if-then checks in the Shuffle phase... (another
instance of leaky abstraction mentioned in the previous point), the implementations of {{MapOutput::shuffle}}
method creates {{IFileInputStreams}} directly without an associated {{IFile.Reader}}


was (Author: asuresh):
[~chris.douglas],
I had initially tried to directly modify the {{IFile}} format to handle the iv. The reason
I felt this would not be such a clean solution is :
* The {{IFile}} currently does not have a notion of an explicit header/metadata.
* While it is possible to use the {{IFile.Writer}} constructor to write the IV and (thus make
it transparent to the rest of the code-base). The reading code-path is not so straight-forward.
There are two classes that extend the {{IFile.Reader}} ({{InMemoryReader}} and {{RawKVIteratorReader}}).
The {{InMemoryReader}} totally ignores the inputStream that is initialized in the base class
constructor and there are places in the codeBase that the input stream is not initialized
in the Reader but in the {{Segment::init()}} method (which in my opinion makes the {{IFile}}
abstraction a bit leaky since the underlying stream should be handled in its entirity in the
IFile Writer/Reader.. the {{Segment}} class (which is part of the {{Merger}} framework) should
avoid dealing with the internals of the ).
* Also, I was not able to do away with a lot of if-then checks in the Shuffle phase... (another
instance of leaky abstraction mentioned in the previous point), the implementations of {{MapOutput::shuffle}}
method creates {{IFileInputStream}}s  directly without an associated {{IFile.Reader}}

> Support for encrypting Intermediate data and spills in local filesystem
> -----------------------------------------------------------------------
>
>                 Key: MAPREDUCE-5890
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5890
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: security
>    Affects Versions: 2.4.0
>            Reporter: Alejandro Abdelnur
>            Assignee: Arun Suresh
>              Labels: encryption
>         Attachments: MAPREDUCE-5890.10.patch, MAPREDUCE-5890.11.patch, MAPREDUCE-5890.12.patch,
MAPREDUCE-5890.3.patch, MAPREDUCE-5890.4.patch, MAPREDUCE-5890.5.patch, MAPREDUCE-5890.6.patch,
MAPREDUCE-5890.7.patch, MAPREDUCE-5890.8.patch, MAPREDUCE-5890.9.patch, org.apache.hadoop.mapred.TestMRIntermediateDataEncryption-output.txt,
syslog.tar.gz
>
>
> For some sensitive data, encryption while in flight (network) is not sufficient, it is
required that while at rest it should be encrypted. HADOOP-10150 & HDFS-6134 bring encryption
at rest for data in filesystem using Hadoop FileSystem API. MapReduce intermediate data and
spills should also be encrypted while at rest.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message