spark-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] [spark] HeartSaVioR commented on a change in pull request #26416: [SPARK-29779][CORE] Compact old event log files and cleanup
Date Tue, 10 Dec 2019 02:58:37 GMT
HeartSaVioR commented on a change in pull request #26416: [SPARK-29779][CORE] Compact old event
log files and cleanup
URL: https://github.com/apache/spark/pull/26416#discussion_r355822917
 
 

 ##########
 File path: docs/configuration.md
 ##########
 @@ -1023,6 +1023,24 @@ Apart from these, the following properties are also available, and
may be useful
     The max size of event log file before it's rolled over.
   </td>
 </tr>
+<tr>
+  <td><code>spark.eventLog.rolling.maxFilesToRetain</code></td>
+  <td>Int.MaxValue</td>
+  <td>
+    The maximum number of event log files which will be retained as non-compacted.
+    By default, all event log files will be retained. Please set the configuration and
+    <code>spark.eventLog.rolling.maxFileSize</code> accordingly if you want to
control
+    the overall size of event log files. The event log files older than these retained
+    files will be compacted into single file and deleted afterwards.<br/>
+    NOTE 1: Compaction will happen in Spark History Server, which means the same value
+    will be applied across applications which are being loaded in Spark History Server,
+    as well as compaction and cleanup would require running Spark History Server.<br/>
+    NOTE 2: Spark History Server may not compact the old event log files if it figures
+    out compaction on event log for such application won't reduce the size at expected
+    rate threshold. For streaming query (including Structured Streaming) we normally
+    expect compaction will run, but for batch query compaction won't run in most cases.
 
 Review comment:
   No I don't expect compaction will run for batch query in most cases, as we measure the
acceptance rate and don't run compaction if the rate is low. (That's a new change reflecting
your suggestion.)
   It might be possible if there're multiple "short" batch queries being run in same driver
process, but except jobserver-like one, I'm not sure it's the one of major cases for batch
query.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Mime
View raw message