Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 45398200BB7 for ; Wed, 9 Nov 2016 23:30:00 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 43B8F160AEB; Wed, 9 Nov 2016 22:30:00 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 8B74E160AFA for ; Wed, 9 Nov 2016 23:29:59 +0100 (CET) Received: (qmail 43338 invoked by uid 500); 9 Nov 2016 22:29:58 -0000 Mailing-List: contact issues-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@spark.apache.org Received: (qmail 43319 invoked by uid 99); 9 Nov 2016 22:29:58 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 09 Nov 2016 22:29:58 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 8D2472C1F5A for ; Wed, 9 Nov 2016 22:29:58 +0000 (UTC) Date: Wed, 9 Nov 2016 22:29:58 +0000 (UTC) From: "Krish Dey (JIRA)" To: issues@spark.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Comment Edited] (SPARK-5682) Add encrypted shuffle in spark MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Wed, 09 Nov 2016 22:30:00 -0000 [ https://issues.apache.org/jira/browse/SPARK-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15652217#comment-15652217 ] Krish Dey edited comment on SPARK-5682 at 11/9/16 10:29 PM: ------------------------------------------------------------ The constructor still seems to be the same as it is. Doesn't this to be changed to accommodate encryption of spill to disk? Moreover passing the DummySerializerInstance it should be allowed to pass any Serializer public UnsafeSorterSpillWriter(BlockManager blockManager, int fileBufferSize, ShuffleWriteMetrics writeMetrics, int numRecordsToWrite) throws IOException { final Tuple2 spilledFileInfo = blockManager.diskBlockManager().createTempLocalBlock(); this.file = spilledFileInfo._2(); this.blockId = spilledFileInfo._1(); this.numRecordsToWrite = numRecordsToWrite; // Unfortunately, we need a serializer instance in order to construct a DiskBlockObjectWriter. // Our write path doesn't actually use this serializer (since we end up calling the `write()` // OutputStream methods), but DiskBlockObjectWriter still calls some methods on it. To work // around this, we pass a dummy no-op serializer. writer = blockManager.getDiskWriter( blockId, file, DummySerializerInstance.INSTANCE, fileBufferSize, writeMetrics); // Write the number of records writeIntToBuffer(numRecordsToWrite, 0); writer.write(writeBuffer, 0, 4); } was (Author: krish.dey): The method still seems to be the same as it is. Doesn't this to be changed to accommodate encryption of spill to disk? public UnsafeSorterSpillWriter(BlockManager blockManager, int fileBufferSize, ShuffleWriteMetrics writeMetrics, int numRecordsToWrite) throws IOException { final Tuple2 spilledFileInfo = blockManager.diskBlockManager().createTempLocalBlock(); this.file = spilledFileInfo._2(); this.blockId = spilledFileInfo._1(); this.numRecordsToWrite = numRecordsToWrite; // Unfortunately, we need a serializer instance in order to construct a DiskBlockObjectWriter. // Our write path doesn't actually use this serializer (since we end up calling the `write()` // OutputStream methods), but DiskBlockObjectWriter still calls some methods on it. To work // around this, we pass a dummy no-op serializer. writer = blockManager.getDiskWriter( blockId, file, DummySerializerInstance.INSTANCE, fileBufferSize, writeMetrics); // Write the number of records writeIntToBuffer(numRecordsToWrite, 0); writer.write(writeBuffer, 0, 4); } > Add encrypted shuffle in spark > ------------------------------ > > Key: SPARK-5682 > URL: https://issues.apache.org/jira/browse/SPARK-5682 > Project: Spark > Issue Type: New Feature > Components: Shuffle > Reporter: liyunzhang_intel > Assignee: Ferdinand Xu > Fix For: 2.1.0 > > Attachments: Design Document of Encrypted Spark Shuffle_20150209.docx, Design Document of Encrypted Spark Shuffle_20150318.docx, Design Document of Encrypted Spark Shuffle_20150402.docx, Design Document of Encrypted Spark Shuffle_20150506.docx > > > Encrypted shuffle is enabled in hadoop 2.6 which make the process of shuffle data safer. This feature is necessary in spark. AES is a specification for the encryption of electronic data. There are 5 common modes in AES. CTR is one of the modes. We use two codec JceAesCtrCryptoCodec and OpensslAesCtrCryptoCodec to enable spark encrypted shuffle which is also used in hadoop encrypted shuffle. JceAesCtrypoCodec uses encrypted algorithms jdk provides while OpensslAesCtrCryptoCodec uses encrypted algorithms openssl provides. > Because ugi credential info is used in the process of encrypted shuffle, we first enable encrypted shuffle on spark-on-yarn framework. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org For additional commands, e-mail: issues-help@spark.apache.org