spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xiao Li (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-18931) Create empty staging directory in partitioned table on insert
Date Mon, 26 Dec 2016 04:23:58 GMT

    [ https://issues.apache.org/jira/browse/SPARK-18931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15777543#comment-15777543
] 

Xiao Li commented on SPARK-18931:
---------------------------------

Yeah. The PR https://github.com/apache/spark/pull/16399 backports the fix to Spark 2.0. 

> Create empty staging directory in partitioned table on insert
> -------------------------------------------------------------
>
>                 Key: SPARK-18931
>                 URL: https://issues.apache.org/jira/browse/SPARK-18931
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.0.2
>            Reporter: Egor Pahomov
>
> CREATE TABLE temp.test_partitioning_4 (                
>   num string
>  )                                 
> PARTITIONED BY (                                    
>   day string)    
>   stored as parquet
> On every 
> INSERT INTO TABLE temp.test_partitioning_4 PARTITION (day)
> select day, count(*) as num from 
> hss.session where year=2016 and month=4 
> group by day
> new directory ".hive-staging_hive_2016-12-19_15-55-11_298_3412488541559534475-4" created
on HDFS.  It's big issue, because I insert every day and bunch of empty dirs on HDFS is very
bad for HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message