spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aaron Davidson (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (SPARK-2468) Netty-based block server / client module
Date Thu, 13 Nov 2014 17:49:34 GMT

    [ https://issues.apache.org/jira/browse/SPARK-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14210074#comment-14210074
] 

Aaron Davidson edited comment on SPARK-2468 at 11/13/14 5:48 PM:
-----------------------------------------------------------------

Here is my spark configuration for the test, 32 cores total (note that this is test-only configuration
to maximize throughput, I would not recommend these settings for real workloads):

spark.shuffle.io.clientThreads =16,
spark.shuffle.io.serverThreads =16,
spark.serializer = "org.apache.spark.serializer.KryoSerializer",
spark.shuffle.blockTransferService = "netty",
spark.shuffle.compress = false,
spark.shuffle.io.maxRetries = 0,
spark.reducer.maxMbInFlight = 512

Forgot to mention, but #3155 now automatically sets spark.shuffle.io.clientThreads and spark.shuffle.io.serverThreads
based on the number of cores the Executor has allotted to it. You can  override it by setting
those properties by hand, but ideally the default behavior is sufficient.


was (Author: ilikerps):
Here is my spark configuration (note 32 cores total):
spark.shuffle.io.clientThreads =16,
spark.shuffle.io.serverThreads =16,
spark.serializer = "org.apache.spark.serializer.KryoSerializer",
spark.shuffle.blockTransferService = "netty",
spark.shuffle.compress = false,
spark.shuffle.io.maxRetries = 0,
spark.reducer.maxMbInFlight = 512

Forgot to mention, but #3155 now automatically sets spark.shuffle.io.clientThreads and spark.shuffle.io.serverThreads
based on the number of cores the Executor has allotted to it. You can  override it by setting
those properties by hand, but ideally the default behavior is sufficient.

> Netty-based block server / client module
> ----------------------------------------
>
>                 Key: SPARK-2468
>                 URL: https://issues.apache.org/jira/browse/SPARK-2468
>             Project: Spark
>          Issue Type: Improvement
>          Components: Shuffle, Spark Core
>            Reporter: Reynold Xin
>            Assignee: Reynold Xin
>            Priority: Critical
>             Fix For: 1.2.0
>
>
> Right now shuffle send goes through the block manager. This is inefficient because it
requires loading a block from disk into a kernel buffer, then into a user space buffer, and
then back to a kernel send buffer before it reaches the NIC. It does multiple copies of the
data and context switching between kernel/user. It also creates unnecessary buffer in the
JVM that increases GC
> Instead, we should use FileChannel.transferTo, which handles this in the kernel space
with zero-copy. See http://www.ibm.com/developerworks/library/j-zerocopy/
> One potential solution is to use Netty.  Spark already has a Netty based network module
implemented (org.apache.spark.network.netty). However, it lacks some functionality and is
turned off by default. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message