Return-Path: X-Original-To: apmail-spark-issues-archive@minotaur.apache.org Delivered-To: apmail-spark-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 1D4FD1117E for ; Fri, 8 Aug 2014 13:04:12 +0000 (UTC) Received: (qmail 39142 invoked by uid 500); 8 Aug 2014 13:04:12 -0000 Delivered-To: apmail-spark-issues-archive@spark.apache.org Received: (qmail 39126 invoked by uid 500); 8 Aug 2014 13:04:11 -0000 Mailing-List: contact issues-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@spark.apache.org Received: (qmail 39064 invoked by uid 99); 8 Aug 2014 13:04:11 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 08 Aug 2014 13:04:11 +0000 Date: Fri, 8 Aug 2014 13:04:11 +0000 (UTC) From: "Saisai Shao (JIRA)" To: issues@spark.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (SPARK-2926) Add MR-style (merge-sort) SortShuffleReader for sort-based shuffle MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/SPARK-2926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saisai Shao updated SPARK-2926: ------------------------------- Attachment: SortBasedShuffleRead.pdf A rough design doc is uploaded. Any comments would be greatly appreciated. > Add MR-style (merge-sort) SortShuffleReader for sort-based shuffle > ------------------------------------------------------------------ > > Key: SPARK-2926 > URL: https://issues.apache.org/jira/browse/SPARK-2926 > Project: Spark > Issue Type: Improvement > Components: Shuffle > Affects Versions: 1.1.0 > Reporter: Saisai Shao > Attachments: SortBasedShuffleRead.pdf > > > Currently Spark has already integrated sort-based shuffle write, which greatly improve the IO performance and reduce the memory consumption when reducer number is very large. But for the reducer side, it still adopts the implementation of hash-based shuffle reader, which neglect the ordering attributes of map output data in some situations. > Here we propose a MR style sort-merge like shuffle reader for sort-based shuffle to better improve the performance of sort-based shuffle. > Working in progress code and performance test report will be posted later when some unit test bugs are fixed. > Any comments would be greatly appreciated. > Thanks a lot. -- This message was sent by Atlassian JIRA (v6.2#6252) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org For additional commands, e-mail: issues-help@spark.apache.org