spark-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] [spark] gengliangwang commented on issue #26671: Revert "[SPARK-26081][SPARK-29999]"
Date Tue, 26 Nov 2019 21:47:18 GMT
gengliangwang commented on issue #26671: Revert "[SPARK-26081][SPARK-29999]"
URL: https://github.com/apache/spark/pull/26671#issuecomment-558831120
 
 
   @HeartSaVioR I have updated the PR description from
   ```
   We found a bug on SPARK-26081 and SPARK-29999 was proposed to fix it, but we decided to
revert both as it's too costly to apply SPARK-29999 for SPARK-26081; SPARK-26081 may be resubmitted
if there's viable approach for dealing with bug.
   ```
   to
   ```
   For Spark file sources, in case of an empty job, we leave the first partition to save meta
for file format like parquet.
   After the changes in SPARK-26081, CSV/JSON/TEXT won't be able to output an empty file for
an empty job. This optimization causes a problem in `ManifestFileCommitProtocol`: the API
`newTaskTempFile` is called without actual file creation. Then `fs.getFileStatus` throws FileNotFoundException
since the file is not created.
   
   SPARK-29999 fixes the problem. But it is too costly to check file existence on each task
commit. We should simply restore the behavior before SPARK-26081.
   ```
   
   So that the context is more straightforward to developers.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Mime
View raw message