spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Apache Spark (JIRA)" <j...@apache.org>
Subject [jira] [Assigned] (SPARK-26081) Do not write empty files by text datasources
Date Thu, 15 Nov 2018 21:17:00 GMT

     [ https://issues.apache.org/jira/browse/SPARK-26081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Apache Spark reassigned SPARK-26081:
------------------------------------

    Assignee: Apache Spark

> Do not write empty files by text datasources
> --------------------------------------------
>
>                 Key: SPARK-26081
>                 URL: https://issues.apache.org/jira/browse/SPARK-26081
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 2.4.0
>            Reporter: Maxim Gekk
>            Assignee: Apache Spark
>            Priority: Minor
>
> Text based datasources like CSV, JSON and Text produces empty files for empty partitions.
This introduces additional overhead while opening and reading such files back. In current
implementation of OutputWriter, the output stream are created eagerly even no records are
written to the stream. So, creation can be postponed up to the first write.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message