Return-Path: X-Original-To: apmail-spark-user-archive@minotaur.apache.org Delivered-To: apmail-spark-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 35EDC1830B for ; Fri, 2 Oct 2015 11:38:48 +0000 (UTC) Received: (qmail 83761 invoked by uid 500); 2 Oct 2015 11:38:43 -0000 Delivered-To: apmail-spark-user-archive@spark.apache.org Received: (qmail 83654 invoked by uid 500); 2 Oct 2015 11:38:43 -0000 Mailing-List: contact user-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list user@spark.apache.org Received: (qmail 83644 invoked by uid 99); 2 Oct 2015 11:38:43 -0000 Received: from Unknown (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 02 Oct 2015 11:38:43 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id EBCBBC01D9 for ; Fri, 2 Oct 2015 11:38:42 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.901 X-Spam-Level: ** X-Spam-Status: No, score=2.901 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_IMAGE_ONLY_32=0.001, HTML_MESSAGE=3, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-us-east.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id Voz2Y6lBwZ1T for ; Fri, 2 Oct 2015 11:38:28 +0000 (UTC) Received: from mail-oi0-f43.google.com (mail-oi0-f43.google.com [209.85.218.43]) by mx1-us-east.apache.org (ASF Mail Server at mx1-us-east.apache.org) with ESMTPS id 18C1E42F8E for ; Fri, 2 Oct 2015 11:38:28 +0000 (UTC) Received: by oixx17 with SMTP id x17so56170869oix.0 for ; Fri, 02 Oct 2015 04:38:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :content-type; bh=n4vhr4QnS7O1NvYSWZjLlszeBpE7AQaomhRB46Wk184=; b=ipcqh4rFbysrbUdBhFKnbhPeLFwBVJj7RIhw6efKDnb/Au/jB5ZAOdBqftkIwmnqf4 lisI3rm5WJaNuya1DzzEJ6VI9QyAhmMhRTHL9PSObl8F6I3aiKRldtCQlUnNHx8Ku4zg +B6Ah4W/pj1O9pxuLBbnm70rvMr3AN2f5nQq4iuzXTOUN64B1KVKMQN6uuTSj/8D2MEA sObW+ZZEpSdLqHQtkyrG4qy8mqdUcdtYzwgkB3TNbuqPWiZWcVUv6N/xkcC+qdMqlq+C DxOxcrcdWl5+7wlVcGQlen0qlCBRkbyjaNH717nAWAt00k0NHRAAa8kH3/JOGTOccxV3 DUaQ== X-Received: by 10.202.78.134 with SMTP id c128mr7838758oib.17.1443785900822; Fri, 02 Oct 2015 04:38:20 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: =?UTF-8?Q?Zolt=C3=A1n_Zvara?= Date: Fri, 02 Oct 2015 11:38:11 +0000 Message-ID: Subject: Re: Shuffle Write v/s Shuffle Read To: Adrian Tanase , Kartik Mathur , user Content-Type: multipart/alternative; boundary=001a11c16c16c017a805211d9951 --001a11c16c16c017a805211d9951 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hi, Shuffle output goes to local disk each time, as far as I know, never to memory. On Fri, Oct 2, 2015 at 1:26 PM Adrian Tanase wrote: > I=E2=80=99m not sure this is related to memory management =E2=80=93 the s= huffle is the > central act of moving data around nodes when the computations need the da= ta > on another node (E.g. Group by, sort, etc) > > Shuffle read and shuffle write should be mirrored on the left/right side > of a shuffle between 2 stages. > > -adrian > > From: Kartik Mathur > Date: Thursday, October 1, 2015 at 10:36 PM > To: user > Subject: Shuffle Write v/s Shuffle Read > > Hi > > I am trying to better understand shuffle in spark . > > Based on my understanding thus far , > > *Shuffle Write* : writes stage output for intermediate stage on local > disk if memory is not sufficient., > Example , if each worker has 200 MB memory for intermediate results and > the results are 300MB then , each executer* will keep 200 MB in memory > and will write remaining 100 MB on local disk . * > > *Shuffle Read : *Each executer will read from other executer's *memory + > disk , so total read in above case will be 300(200 from memory and 100 fr= om > disk)*num of executers ? * > > Is my understanding correct ? > > Thanks, > Kartik > --001a11c16c16c017a805211d9951 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Hi,

Shuffle output goes to local disk e= ach time, as far as I know, never to memory.

On Fri, Oct 2, 2015 at 1:26 PM Adrian Tanase &= lt;atanase@adobe.com> wrote:
I=E2=80=99m not sure this is related to memory management =E2=80=93 th= e shuffle is the central act of moving data around nodes when the computati= ons need the data on another node (E.g. Group by, sort, etc)

Shuffle read and shuffle write should be mirrored on the left/right si= de of a shuffle between 2 stages.

-adrian

From: Kartik Mathur
Date: Thursday, October 1, 2015 at = 10:36 PM
To: user
Subject: Shuffle Write v/s Shuffle = Read

Hi=C2=A0

I am trying to better understand shuffle in spark .

Based on my understanding thus far ,=C2=A0

Shuffle Write : writes stage output for intermediate sta= ge on local disk if memory is not sufficient.,
Example , if each worker has 200 MB memory for intermediate results an= d the results are 300MB then , each executer=C2=A0will keep 200 MB in me= mory and will write remaining 100 MB on local disk . =C2=A0

Shuffle Read : Each executer will r= ead from other executer's memory + disk , so total read in above case will be 300(200 from memory = and 100 from disk)*num of executers ?=C2=A0=C2=A0

Is my understanding correct ?

Thanks,
Kartik
--001a11c16c16c017a805211d9951--