spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daniel Mateus Pires (JIRA)" <j...@apache.org>
Subject [jira] [Created] (SPARK-25480) Dynamic partitioning + saveAsTable with multiple partition columns create empty directory
Date Thu, 20 Sep 2018 10:00:00 GMT
Daniel Mateus Pires created SPARK-25480:
-------------------------------------------

             Summary: Dynamic partitioning + saveAsTable with multiple partition columns create
empty directory
                 Key: SPARK-25480
                 URL: https://issues.apache.org/jira/browse/SPARK-25480
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 2.3.0
            Reporter: Daniel Mateus Pires


We use .saveAsTable and dynamic partitioning as our only way to write data to S3 from Spark.

When only 1 partition column is defined for a table, .saveAsTable behaves as expected:
 - with Overwrite mode it will create a table if it doesn't exist and write the data
 - with Append mode it will append to a given partition
 - with Overwrite mode if the table exists it will overwrite the partition

If 2 partition columns are used however, the directory is created on S3 with the SUCCESS file,
but no data is actually written

our solution is to check if the table doesn't exist, and in that case, set the partitioning
mode back to static before running saveAsTable:
{code}
spark.conf.set("spark.sql.sources.partitionOverwriteMode", "dynamic")
df.write.mode("overwrite").partitionBy("year", "month").option("path", "s3://hbc-data-warehouse/integration/users_test").saveAsTable("users_test")
{code}
 




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message