Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 1E802200C53 for ; Tue, 11 Apr 2017 18:23:47 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 1D120160B9B; Tue, 11 Apr 2017 16:23:47 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 6900D160B9E for ; Tue, 11 Apr 2017 18:23:46 +0200 (CEST) Received: (qmail 32015 invoked by uid 500); 11 Apr 2017 16:23:45 -0000 Mailing-List: contact issues-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@spark.apache.org Received: (qmail 31985 invoked by uid 99); 11 Apr 2017 16:23:45 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 11 Apr 2017 16:23:45 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 091CEC14F1 for ; Tue, 11 Apr 2017 16:23:45 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -100.002 X-Spam-Level: X-Spam-Status: No, score=-100.002 tagged_above=-999 required=6.31 tests=[RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id bFI2n5kgRkuK for ; Tue, 11 Apr 2017 16:23:44 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id 1DFB160DB2 for ; Tue, 11 Apr 2017 16:23:43 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 4D392E0D4A for ; Tue, 11 Apr 2017 16:23:42 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id A94492406B for ; Tue, 11 Apr 2017 16:23:41 +0000 (UTC) Date: Tue, 11 Apr 2017 16:23:41 +0000 (UTC) From: "jin xing (JIRA)" To: issues@spark.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (SPARK-19659) Fetch big blocks to disk when shuffle-read MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Tue, 11 Apr 2017 16:23:47 -0000 [ https://issues.apache.org/jira/browse/SPARK-19659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15964596#comment-15964596 ] jin xing commented on SPARK-19659: ---------------------------------- *bytesShuffleToMemory* is different from *bytesInFlight*. *bytesInFlight* is for only one *ShuffleBlockFetcherIterator* and get decreased when the remote blocks is received. *bytesShuffleToMemory* is for all ShuffleBlockFetcherIterators and get decreased only when reference count of ByteBuf is zero(though the memory maybe still inside cache and not really destroyed). If *spark.reducer.maxReqsInFlight* is only for memory control, I think *spark.reducer.maxBytesShuffleToMemory* is an improvement. In the current PR, I want to simplify the logic and the memory is tracked by *bytesShuffleToMemory* and memory usage is not tracked by MemoryManager. > Fetch big blocks to disk when shuffle-read > ------------------------------------------ > > Key: SPARK-19659 > URL: https://issues.apache.org/jira/browse/SPARK-19659 > Project: Spark > Issue Type: Improvement > Components: Shuffle > Affects Versions: 2.1.0 > Reporter: jin xing > Attachments: SPARK-19659-design-v1.pdf, SPARK-19659-design-v2.pdf > > > Currently the whole block is fetched into memory(offheap by default) when shuffle-read. A block is defined by (shuffleId, mapId, reduceId). Thus it can be large when skew situations. If OOM happens during shuffle read, job will be killed and users will be notified to "Consider boosting spark.yarn.executor.memoryOverhead". Adjusting parameter and allocating more memory can resolve the OOM. However the approach is not perfectly suitable for production environment, especially for data warehouse. > Using Spark SQL as data engine in warehouse, users hope to have a unified parameter(e.g. memory) but less resource wasted(resource is allocated but not used), > It's not always easy to predict skew situations, when happen, it make sense to fetch remote blocks to disk for shuffle-read, rather than > kill the job because of OOM. This approach is mentioned during the discussion in SPARK-3019, by [~sandyr] and [~mridulm80] -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org For additional commands, e-mail: issues-help@spark.apache.org