spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <>
Subject [jira] [Commented] (SPARK-24920) Spark should allow sharing netty's memory pools across all uses
Date Tue, 11 Dec 2018 19:33:01 GMT


ASF GitHub Bot commented on SPARK-24920:

ankuriitg commented on a change in pull request #23278: [SPARK-24920][Core] Allow sharing
Netty's memory pool allocators

 File path: common/network-common/src/main/java/org/apache/spark/network/util/
 @@ -95,6 +111,38 @@ public static String getRemoteAddress(Channel channel) {
     return "<unknown remote>";
+  /**
+   * Returns the default number of threads for both the Netty client and server thread pools.
+   * If numUsableCores is 0, we will use Runtime get an approximate number of available cores.
+   */
+  public static int defaultNumThreads(int numUsableCores) {
+    final int availableCores;
+    if (numUsableCores > 0) {
+      availableCores = numUsableCores;
+    } else {
+      availableCores = Runtime.getRuntime().availableProcessors();
+    }
+    return Math.min(availableCores, MAX_DEFAULT_NETTY_THREADS);
+  }
+  /**
+   * Returns the lazily created shared pooled ByteBuf allocator for the specified allowCache
+   * parameter value.
+   */
+  public static synchronized PooledByteBufAllocator getSharedPooledByteBufAllocator(
 Review comment:
   Maybe use double-checked locking instead of method synchronization, since the instantiation
just needs to happen once but this may unnecessarily block all later calls.

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:

> Spark should allow sharing netty's memory pools across all uses
> ---------------------------------------------------------------
>                 Key: SPARK-24920
>                 URL:
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 2.4.0
>            Reporter: Imran Rashid
>            Priority: Major
>              Labels: memory-analysis
> Spark currently creates separate netty memory pools for each of the following "services":
> 1) RPC Client
> 2) RPC Server
> 3) BlockTransfer Client
> 4) BlockTransfer Server
> 5) ExternalShuffle Client
> Depending on configuration and whether its an executor or driver JVM, different of these
are active, but its always either 3 or 4.
> Having them independent somewhat defeats the purpose of using pools at all.  In my experiments
I've found each pool will grow due to a burst of activity in the related service (eg. task
start / end msgs), followed another burst in a different service (eg. sending torrent broadcast
blocks).  Because of the way these pools work, they allocate memory in large chunks (16 MB
by default) for each netty thread, so there is often a surge of 128 MB of allocated memory,
even for really tiny messages.  Also a lot of this memory is offheap by default, which makes
it even tougher for users to manage.
> I think it would make more sense to combine all of these into a single pool.  In some
experiments I tried, this noticeably decreased memory usage, both onheap and offheap (no significant
performance effect in my small experiments).
> As this is a pretty core change, as I first step I'd propose just exposing this as a
conf, to let user experiment more broadly across a wider range of workloads

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message