Return-Path: X-Original-To: apmail-hadoop-mapreduce-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C898711E10 for ; Tue, 1 Jul 2014 04:51:25 +0000 (UTC) Received: (qmail 79192 invoked by uid 500); 1 Jul 2014 04:51:25 -0000 Delivered-To: apmail-hadoop-mapreduce-issues-archive@hadoop.apache.org Received: (qmail 79126 invoked by uid 500); 1 Jul 2014 04:51:25 -0000 Mailing-List: contact mapreduce-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-issues@hadoop.apache.org Delivered-To: mailing list mapreduce-issues@hadoop.apache.org Received: (qmail 79115 invoked by uid 99); 1 Jul 2014 04:51:25 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 01 Jul 2014 04:51:25 +0000 Date: Tue, 1 Jul 2014 04:51:25 +0000 (UTC) From: "Alejandro Abdelnur (JIRA)" To: mapreduce-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Comment Edited] (MAPREDUCE-5890) Support for encrypting Intermediate data and spills in local filesystem MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/MAPREDUCE-5890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14048355#comment-14048355 ] Alejandro Abdelnur edited comment on MAPREDUCE-5890 at 7/1/14 4:50 AM: ----------------------------------------------------------------------- [~chris.douglas], I had initially tried to directly modify the {{IFile}} format to handle the iv. The reason I felt this would not be such a clean solution is : * The {{IFile}} currently does not have a notion of an explicit header/metadata. * While it is possible to use the {{IFile.Writer}} constructor to write the IV and (thus make it transparent to the rest of the code-base). The reading code-path is not so straight-forward. There are two classes that extend the {{IFile.Reader}} ({{InMemoryReader}} and {{RawKVIteratorReader}}). The {{InMemoryReader}} totally ignores the inputStream that is initialized in the base class constructor and there are places in the codeBase that the input stream is not initialized in the Reader but in the {{Segment::init()}} method (which in my opinion makes the {{IFile}} abstraction a bit leaky since the underlying stream should be handled in its entirity in the IFile Writer/Reader.. the {{Segment}} class (which is part of the {{Merger}} framework) should avoid dealing with the internals of the ). * Also, I was not able to do away with a lot of if-then checks in the Shuffle phase... (another instance of leaky abstraction mentioned in the previous point), the implementations of {{MapOutput::shuffle}} method creates {{IFileInputStreams}} directly without an associated {{IFile.Reader}} was (Author: asuresh): [~chris.douglas], I had initially tried to directly modify the {{IFile}} format to handle the iv. The reason I felt this would not be such a clean solution is : * The {{IFile}} currently does not have a notion of an explicit header/metadata. * While it is possible to use the {{IFile.Writer}} constructor to write the IV and (thus make it transparent to the rest of the code-base). The reading code-path is not so straight-forward. There are two classes that extend the {{IFile.Reader}} ({{InMemoryReader}} and {{RawKVIteratorReader}}). The {{InMemoryReader}} totally ignores the inputStream that is initialized in the base class constructor and there are places in the codeBase that the input stream is not initialized in the Reader but in the {{Segment::init()}} method (which in my opinion makes the {{IFile}} abstraction a bit leaky since the underlying stream should be handled in its entirity in the IFile Writer/Reader.. the {{Segment}} class (which is part of the {{Merger}} framework) should avoid dealing with the internals of the ). * Also, I was not able to do away with a lot of if-then checks in the Shuffle phase... (another instance of leaky abstraction mentioned in the previous point), the implementations of {{MapOutput::shuffle}} method creates {{IFileInputStream}}s directly without an associated {{IFile.Reader}} > Support for encrypting Intermediate data and spills in local filesystem > ----------------------------------------------------------------------- > > Key: MAPREDUCE-5890 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5890 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: security > Affects Versions: 2.4.0 > Reporter: Alejandro Abdelnur > Assignee: Arun Suresh > Labels: encryption > Attachments: MAPREDUCE-5890.10.patch, MAPREDUCE-5890.11.patch, MAPREDUCE-5890.12.patch, MAPREDUCE-5890.3.patch, MAPREDUCE-5890.4.patch, MAPREDUCE-5890.5.patch, MAPREDUCE-5890.6.patch, MAPREDUCE-5890.7.patch, MAPREDUCE-5890.8.patch, MAPREDUCE-5890.9.patch, org.apache.hadoop.mapred.TestMRIntermediateDataEncryption-output.txt, syslog.tar.gz > > > For some sensitive data, encryption while in flight (network) is not sufficient, it is required that while at rest it should be encrypted. HADOOP-10150 & HDFS-6134 bring encryption at rest for data in filesystem using Hadoop FileSystem API. MapReduce intermediate data and spills should also be encrypted while at rest. -- This message was sent by Atlassian JIRA (v6.2#6252)