spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wenchen Fan (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (SPARK-20640) Make rpc timeout and retry for shuffle registration configurable
Date Wed, 21 Jun 2017 13:56:00 GMT

     [ https://issues.apache.org/jira/browse/SPARK-20640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Wenchen Fan resolved SPARK-20640.
---------------------------------
       Resolution: Fixed
    Fix Version/s: 2.3.0

Issue resolved by pull request 18092
[https://github.com/apache/spark/pull/18092]

> Make rpc timeout and retry for shuffle registration configurable
> ----------------------------------------------------------------
>
>                 Key: SPARK-20640
>                 URL: https://issues.apache.org/jira/browse/SPARK-20640
>             Project: Spark
>          Issue Type: Bug
>          Components: Shuffle
>    Affects Versions: 2.0.2
>            Reporter: Sital Kedia
>             Fix For: 2.3.0
>
>
> Currently the shuffle service registration timeout and retry has been hardcoded (see
https://github.com/sitalkedia/spark/blob/master/network/shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalShuffleClient.java#L144
and https://github.com/sitalkedia/spark/blob/master/core/src/main/scala/org/apache/spark/storage/BlockManager.scala#L197).
This works well for small workloads but under heavy workload when the shuffle service is busy
transferring large amount of data we see significant delay in responding to the registration
request, as a result we often see the executors fail to register with the shuffle service,
eventually failing the job. We need to make these two parameters configurable.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message