spark-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] [spark] squito commented on a change in pull request #25299: [SPARK-27651][Core] Avoid the network when shuffle blocks are fetched from the same host
Date Thu, 01 Aug 2019 15:55:44 GMT
squito commented on a change in pull request #25299: [SPARK-27651][Core] Avoid the network
when shuffle blocks are fetched from the same host
URL: https://github.com/apache/spark/pull/25299#discussion_r309769700
 
 

 ##########
 File path: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala
 ##########
 @@ -365,12 +387,46 @@ final class ShuffleBlockFetcherIterator(
     }
   }
 
+
+  /**
+   * Fetch the host-local blocks while we are fetching remote blocks. This is ok because
+   * `ManagedBuffer`'s memory is allocated lazily when we create the input stream, so all
we
+   * track in-memory are the ManagedBuffer references themselves.
+   */
+  private[this] def fetchHostLocalBlocks() {
+    logDebug(s"Start fetching host-local blocks: ${hostLocalBlocks.mkString(", ")}")
+
+    val localDirsByExec =
+      blockManager.master.getHostLocalDirs(hostLocalBlocksByExecutor.keySet.toArray).localDirs
 
 Review comment:
   the only reason I can see for ever turning this feature off is the extra roundtrip to the
driver required here.  But instead of doing this once-per-task, could we add a cache on each
executor for this data?  It shouldn't change much, but I guess you would need to limit the
size as you could in theory have execs come and go a ton.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Mime
View raw message