spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Krish Dey (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (SPARK-5682) Add encrypted shuffle in spark
Date Wed, 09 Nov 2016 22:29:58 GMT

    [ https://issues.apache.org/jira/browse/SPARK-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15652217#comment-15652217
] 

Krish Dey edited comment on SPARK-5682 at 11/9/16 10:29 PM:
------------------------------------------------------------

The constructor still seems to be the same as it is. Doesn't this to be changed to accommodate
encryption of spill to disk?  Moreover passing the DummySerializerInstance it should be allowed
to pass any Serializer

public UnsafeSorterSpillWriter(BlockManager blockManager, int fileBufferSize, ShuffleWriteMetrics
writeMetrics, int numRecordsToWrite) throws IOException {
    final Tuple2<TempLocalBlockId, File> spilledFileInfo =  blockManager.diskBlockManager().createTempLocalBlock();
    this.file = spilledFileInfo._2();
    this.blockId = spilledFileInfo._1();
    this.numRecordsToWrite = numRecordsToWrite;
    // Unfortunately, we need a serializer instance in order to construct a DiskBlockObjectWriter.
    // Our write path doesn't actually use this serializer (since we end up calling the `write()`
    // OutputStream methods), but DiskBlockObjectWriter still calls some methods on it. To
work
    // around this, we pass a dummy no-op serializer.
    writer = blockManager.getDiskWriter(
      blockId, file, DummySerializerInstance.INSTANCE, fileBufferSize, writeMetrics);
    // Write the number of records
    writeIntToBuffer(numRecordsToWrite, 0);
    writer.write(writeBuffer, 0, 4);
  }



was (Author: krish.dey):
The method still seems to be the same as it is. Doesn't this to be changed to accommodate
encryption of spill to disk?

public UnsafeSorterSpillWriter(BlockManager blockManager, int fileBufferSize, ShuffleWriteMetrics
writeMetrics, int numRecordsToWrite) throws IOException {
    final Tuple2<TempLocalBlockId, File> spilledFileInfo =  blockManager.diskBlockManager().createTempLocalBlock();
    this.file = spilledFileInfo._2();
    this.blockId = spilledFileInfo._1();
    this.numRecordsToWrite = numRecordsToWrite;
    // Unfortunately, we need a serializer instance in order to construct a DiskBlockObjectWriter.
    // Our write path doesn't actually use this serializer (since we end up calling the `write()`
    // OutputStream methods), but DiskBlockObjectWriter still calls some methods on it. To
work
    // around this, we pass a dummy no-op serializer.
    writer = blockManager.getDiskWriter(
      blockId, file, DummySerializerInstance.INSTANCE, fileBufferSize, writeMetrics);
    // Write the number of records
    writeIntToBuffer(numRecordsToWrite, 0);
    writer.write(writeBuffer, 0, 4);
  }


> Add encrypted shuffle in spark
> ------------------------------
>
>                 Key: SPARK-5682
>                 URL: https://issues.apache.org/jira/browse/SPARK-5682
>             Project: Spark
>          Issue Type: New Feature
>          Components: Shuffle
>            Reporter: liyunzhang_intel
>            Assignee: Ferdinand Xu
>             Fix For: 2.1.0
>
>         Attachments: Design Document of Encrypted Spark Shuffle_20150209.docx, Design
Document of Encrypted Spark Shuffle_20150318.docx, Design Document of Encrypted Spark Shuffle_20150402.docx,
Design Document of Encrypted Spark Shuffle_20150506.docx
>
>
> Encrypted shuffle is enabled in hadoop 2.6 which make the process of shuffle data safer.
This feature is necessary in spark. AES  is a specification for the encryption of electronic
data. There are 5 common modes in AES. CTR is one of the modes. We use two codec JceAesCtrCryptoCodec
and OpensslAesCtrCryptoCodec to enable spark encrypted shuffle which is also used in hadoop
encrypted shuffle. JceAesCtrypoCodec uses encrypted algorithms  jdk provides while OpensslAesCtrCryptoCodec
uses encrypted algorithms  openssl provides. 
> Because ugi credential info is used in the process of encrypted shuffle, we first enable
encrypted shuffle on spark-on-yarn framework.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message