hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lefty Leverenz (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-7810) Insert overwrite table query has strange behavior when set hive.optimize.union.remove=true [Spark Branch]
Date Sat, 30 May 2015 21:34:17 GMT

    [ https://issues.apache.org/jira/browse/HIVE-7810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14566207#comment-14566207
] 

Lefty Leverenz commented on HIVE-7810:
--------------------------------------

Adding TODOC15 (which means TODOC1.1.0).

Besides documenting *hive.merge.sparkfiles* in Configuration Properties, usage notes should
be included in the HoS doc.  Also see HIVE-8043, Support merging small files.

* [Hive on Spark: Getting Started | https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started]
* [Configuration Properties -- Spark | https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-Spark]
with crossreferences to & from:
** [hive.merge.mapfiles | https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.merge.mapfiles]
** [hive.merge.mapredfiles | https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.merge.mapredfiles]
** and maybe [hive.optimize.union.remove | https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.optimize.union.remove]
(see following question)

Does *hive.merge.sparkfiles* affect *hive.optimize.union.remove* like *hive.merge.mapfiles*
and *hive.merge.mapredfiles*?

bq.  The merge is triggered if either of hive.merge.mapfiles or hive.merge.mapredfiles is
set to true. If the user has set hive.merge.mapfiles to true and hive.merge.mapredfiles to
false, the idea was that the number of reducers are few, so the number of files anyway is
small. However, with this optimization, we are increasing the number of files possibly by
a big margin. So, we merge aggresively.


> Insert overwrite table query has strange behavior when set hive.optimize.union.remove=true
[Spark Branch]
> ---------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-7810
>                 URL: https://issues.apache.org/jira/browse/HIVE-7810
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Spark
>            Reporter: Na Yang
>            Assignee: Na Yang
>              Labels: TODOC-SPARK, TODOC15
>             Fix For: 1.1.0
>
>         Attachments: HIVE-7810.1-spark.patch
>
>
> Insert overwrite table query has strange behavior when 
> set hive.optimize.union.remove=true
> set hive.mapred.supports.subdirectories=true;
> set hive.merge.mapfiles=true;
> set hive.merge.mapredfiles=true;
> We expect the following two sets of queries return the same set of data result, but they
do not. 
> 1)
> {noformat}
> insert overwrite table outputTbl1
> SELECT * FROM
> (
> select key, 1 as values from inputTbl1
> union all
> select * FROM (
>   SELECT key, count(1) as values from inputTbl1 group by key
>   UNION ALL
>   SELECT key, 2 as values from inputTbl1
> ) a
> )b;
> select * from outputTbl1 order by key, values;
> {noformat}
> Below is the query result:
> {noformat}
> 1	1
> 1	2
> 2	1
> 2	2
> 3	1
> 3	2
> 7	1
> 7	2
> 8	2
> 8	2
> 8	2
> {noformat}
> 2) 
> {noformat}
> SELECT * FROM
> (
> select key, 1 as values from inputTbl1
> union all
> select * FROM (
>   SELECT key, count(1) as values from inputTbl1 group by key
>   UNION ALL
>   SELECT key, 2 as values from inputTbl1
> ) a
> )b order by key, values;
> {noformat}
> Below is the query result:
> {noformat}
> 1	1
> 1	1
> 1	2
> 2	1
> 2	1
> 2	2
> 3	1
> 3	1
> 3	2
> 7	1
> 7	1
> 7	2
> 8	1
> 8	1
> 8	2
> 8	2
> 8	2
> {noformat}
> Some data is missing in the first set of query result. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message