spark-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] [spark] HeartSaVioR commented on a change in pull request #26502: [SPARK-29876][SS] Delete/archive file source completed files in separate thread
Date Mon, 18 Nov 2019 02:13:36 GMT
HeartSaVioR commented on a change in pull request #26502: [SPARK-29876][SS] Delete/archive
file source completed files in separate thread
URL: https://github.com/apache/spark/pull/26502#discussion_r347179731
 
 

 ##########
 File path: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamSource.scala
 ##########
 @@ -342,7 +344,14 @@ object FileStreamSource {
     def size: Int = map.size()
   }
 
-  private[sql] trait FileStreamSourceCleaner {
+  private[sql] abstract class FileStreamSourceCleaner {
+    protected val cleanThreadPool = ThreadUtils.newDaemonCachedThreadPool(
+      "file-source-cleaner-threadpool",
+      SQLConf.get.getConf(SQLConf.FILE_SOURCE_CLEANER_NUM_THREADS)
+    )
+
+    def stop(): Unit = cleanThreadPool.shutdown()
 
 Review comment:
   > The main intention is to free up source side not to have horror execution time of
directory listing which still stands.
   
   You may want to revisit the description of [SPARK-20568](https://issues.apache.org/jira/browse/SPARK-20568)
as well as comments in the issue. 
   
   The origin problem is "disk space", and if we fail to delete it, we may want to guide which
files can be deleted manually without breaking the query execution. Reducing the cost of listing
source files is one of side effects, not the main goal. Yes that brings one of major benefits
though it's a side effect, but it is still not the main goal.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Mime
View raw message