spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shao, Saisai" <saisai.s...@intel.com>
Subject RE: Shuffle to HDFS
Date Mon, 26 Jan 2015 07:23:28 GMT
Hey Larry,

I don’t think Hadoop will put shuffle output in HDFS, instead it’s behavior is the same
as what Spark did, store mapper output (shuffle) data on local disks. You might misunderstood
something ☺.

Thanks
Jerry

From: Larry Liu [mailto:larryliu05@gmail.com]
Sent: Monday, January 26, 2015 3:03 PM
To: Shao, Saisai
Cc: user@spark.incubator.apache.org
Subject: Re: Shuffle to HDFS

Hi,Jerry

Thanks for your reply.

The reason I have this question is that in Hadoop, mapper intermediate output (shuffle) will
be stored in HDFS. I think the default location for spark is /tmp I think.

Larry

On Sun, Jan 25, 2015 at 9:44 PM, Shao, Saisai <saisai.shao@intel.com<mailto:saisai.shao@intel.com>>
wrote:
Hi Larry,

I don’t think current Spark’s shuffle can support HDFS as a shuffle output. Anyway, is
there any specific reason to spill shuffle data to HDFS or NFS, this will severely increase
the shuffle time.

Thanks
Jerry

From: Larry Liu [mailto:larryliu05@gmail.com<mailto:larryliu05@gmail.com>]
Sent: Sunday, January 25, 2015 4:45 PM
To: user@spark.incubator.apache.org<mailto:user@spark.incubator.apache.org>
Subject: Shuffle to HDFS

How to change shuffle output to HDFS or NFS?

Mime
View raw message