spark-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <>
Subject [GitHub] [spark] Victsm commented on issue #27665: [SPARK-24355][Core][FOLLOWUP] Add flag for fetching chunks in async mode
Date Mon, 24 Feb 2020 20:04:23 GMT
Victsm commented on issue #27665: [SPARK-24355][Core][FOLLOWUP] Add flag for fetching chunks
in async mode
   @xuanyuanking we worked on the original fix of this issue. Having await in there is the
key to the benefits provided in SPARK-24355, which improves reliability of Spark shuffle in
a reasonable scaled deployment. This issue seems common across companies like us (LinkedIn),
Netflix, Uber, Yahoo. As mentioned in #22173, what we observed is that in cases where HDD
is used for shuffle storage, the disk is saturated first before the network can be saturated.
So, for a reasonable scaled deployment, having this fix provides a boost in shuffle reliability
without hurting much on the performance side. This is also validated by @tgravescs in the
Yahoo deployment of this patch.
   It's reasonable to introduce another config that disables this reliability improvement
if it leads to performance regression in certain deployment mode. Just want to see whether
we should leave this enabled by default or not. Also, as mentioned in #22173, we have discovered
a potential fix to this perf regression issue that does not removes its reliability benefits.
It will take some extra time on our side to evaluate that fix, which is a fix inside Netty.
Want to make sure the broader community knows what we have been doing for this issue, so we
do not take away a potential reliability improvement to Spark.

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:

With regards,
Apache Git Services

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message