hudi-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1418: [HUDI-678] Make config package spark free
Date Wed, 25 Mar 2020 22:48:44 GMT
vinothchandar commented on a change in pull request #1418: [HUDI-678] Make config package spark
free
URL: https://github.com/apache/incubator-hudi/pull/1418#discussion_r398217268
 
 

 ##########
 File path: hudi-client/src/main/java/org/apache/hudi/config/HoodieMemoryConfig.java
 ##########
 @@ -113,40 +112,8 @@ public Builder withWriteStatusFailureFraction(double failureFraction)
{
       return this;
     }
 
-    /**
-     * Dynamic calculation of max memory to use for for spillable map. user.available.memory
= spark.executor.memory *
-     * (1 - spark.memory.fraction) spillable.available.memory = user.available.memory * hoodie.memory.fraction.
Anytime
-     * the spark.executor.memory or the spark.memory.fraction is changed, the memory used
for spillable map changes
-     * accordingly
-     */
     private long getMaxMemoryAllowedForMerge(String maxMemoryFraction) {
-      final String SPARK_EXECUTOR_MEMORY_PROP = "spark.executor.memory";
-      final String SPARK_EXECUTOR_MEMORY_FRACTION_PROP = "spark.memory.fraction";
-      // This is hard-coded in spark code {@link
-      // https://github.com/apache/spark/blob/576c43fb4226e4efa12189b41c3bc862019862c6/core/src/main/scala/org/apache/
-      // spark/memory/UnifiedMemoryManager.scala#L231} so have to re-define this here
-      final String DEFAULT_SPARK_EXECUTOR_MEMORY_FRACTION = "0.6";
-      // This is hard-coded in spark code {@link
-      // https://github.com/apache/spark/blob/576c43fb4226e4efa12189b41c3bc862019862c6/core/src/main/scala/org/apache/
-      // spark/SparkContext.scala#L471} so have to re-define this here
-      final String DEFAULT_SPARK_EXECUTOR_MEMORY_MB = "1024"; // in MB
-
-      if (SparkEnv.get() != null) {
-        // 1 GB is the default conf used by Spark, look at SparkContext.scala
-        long executorMemoryInBytes = Utils.memoryStringToMb(
-            SparkEnv.get().conf().get(SPARK_EXECUTOR_MEMORY_PROP, DEFAULT_SPARK_EXECUTOR_MEMORY_MB))
* 1024 * 1024L;
-        // 0.6 is the default value used by Spark,
-        // look at {@link
-        // https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/SparkConf.scala#L507}
-        double memoryFraction = Double.parseDouble(
-            SparkEnv.get().conf().get(SPARK_EXECUTOR_MEMORY_FRACTION_PROP, DEFAULT_SPARK_EXECUTOR_MEMORY_FRACTION));
-        double maxMemoryFractionForMerge = Double.parseDouble(maxMemoryFraction);
-        double userAvailableMemory = executorMemoryInBytes * (1 - memoryFraction);
-        long maxMemoryForMerge = (long) Math.floor(userAvailableMemory * maxMemoryFractionForMerge);
-        return Math.max(DEFAULT_MIN_MEMORY_FOR_SPILLABLE_MAP_IN_BYTES, maxMemoryForMerge);
-      } else {
-        return DEFAULT_MAX_MEMORY_FOR_SPILLABLE_MAP_IN_BYTES;
-      }
+      return ConfigUtils.getMaxMemoryAllowedForMerge(props, maxMemoryFraction);
 
 Review comment:
   Okay.. if we call a Spark specific class here, then this does not achieve the purpose right..

   
   i.e you cannot move `ConfigUtils` to hudi-spark and keep config in `hudi-writer-common`
without making hudi-writer-common depend on ConfigUtils? 
   
   We should change the caller of `getMaxMemoryAllowedForMerge` to use ConfigUtils.getXX()`
just like how you did for storage level? 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

Mime
View raw message