spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joseph Fourny (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (SPARK-15690) Fast single-node (single-process) in-memory shuffle
Date Thu, 16 Jun 2016 03:01:05 GMT

    [ https://issues.apache.org/jira/browse/SPARK-15690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15332960#comment-15332960
] 

Joseph Fourny edited comment on SPARK-15690 at 6/16/16 3:00 AM:
----------------------------------------------------------------

I am trying to develop single-node clusters on large servers (30+ CPU cores) with 2-3 TB or
RAM. Our use cases involve small to medium size datasets, but with a huge amount of concurrent
jobs (shared, multi-tenant environments). Efficiency and sub-second response times are the
primary requirements. This shuffle between stages is the current bottleneck. Writing anything
to disk is just a waste of time if all computations are done in the same JVM (or a small set
of JVMs on the same machine). We tried using RAMFS to avoid disk I/O, but still a lot of CPU
time is spent in compression and serialization. I would rather not hack my way out of this
one. Is it wishful thinking to have this officially supported?


was (Author: josephfourny):
+1 on this. I am trying to develop single-node clusters on large servers (30+ CPU cores) with
2-3 TB or RAM. Our use cases involve small to medium size datasets, but with a huge amount
of concurrent jobs (shared, multi-tenant environments). Efficiency and sub-second response
times are the primary requirements. This shuffle between stages is the current bottleneck.
Writing anything to disk is just a waste of time if all computations are done in the same
JVM (or a small set of JVMs on the same machine). We tried using RAMFS to avoid disk I/O,
but still a lot of CPU time is spent in compression and serialization. I would rather not
hack my way out of this one. Is it wishful thinking to have this officially supported?

> Fast single-node (single-process) in-memory shuffle
> ---------------------------------------------------
>
>                 Key: SPARK-15690
>                 URL: https://issues.apache.org/jira/browse/SPARK-15690
>             Project: Spark
>          Issue Type: New Feature
>          Components: Shuffle, SQL
>            Reporter: Reynold Xin
>
> Spark's current shuffle implementation sorts all intermediate data by their partition
id, and then write the data to disk. This is not a big bottleneck because the network throughput
on commodity clusters tend to be low. However, an increasing number of Spark users are using
the system to process data on a single-node. When in a single node operating against intermediate
data that fits in memory, the existing shuffle code path can become a big bottleneck.
> The goal of this ticket is to change Spark so it can use in-memory radix sort to do data
shuffling on a single node, and still gracefully fallback to disk if the data size does not
fit in memory. Given the number of partitions is usually small (say less than 256), it'd require
only a single pass do to the radix sort with pretty decent CPU efficiency.
> Note that there have been many in-memory shuffle attempts in the past. This ticket has
a smaller scope (single-process), and aims to actually productionize this code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message