hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Na Yang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-7810) Insert overwrite table query has strange behavior when set hive.optimize.union.remove=true [Spark Branch]
Date Wed, 20 Aug 2014 22:47:26 GMT

    [ https://issues.apache.org/jira/browse/HIVE-7810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14104730#comment-14104730
] 

Na Yang commented on HIVE-7810:
-------------------------------

Hi Chao,

For the (1) you observed, the union optimization changed the operator tree and task tree,
so that the outer union is removed from the task tree. It expects the move task to do the
union work to merge the inner union result and the other map result to the destination location.
  
For the (2) you observed, I have fixed the GraphTran in HIVE-7767, but the patch has not been
committed to the branch yet. That is why you only see two MapWork, not three MapWork in the
dependency graph. Please wait until that patch gets committed.  

The behavior I reported in this JIRA is from my local build with the HIVE-7767 patch applied.


Thanks,
Na

> Insert overwrite table query has strange behavior when set hive.optimize.union.remove=true
[Spark Branch]
> ---------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-7810
>                 URL: https://issues.apache.org/jira/browse/HIVE-7810
>             Project: Hive
>          Issue Type: Task
>          Components: Spark
>            Reporter: Na Yang
>            Assignee: Na Yang
>
> Insert overwrite table query has strange behavior when 
> set hive.optimize.union.remove=true
> set hive.mapred.supports.subdirectories=true;
> We expect the following two sets of queries return the same set of data result, but they
do not. 
> 1)
> {noformat}
> insert overwrite table outputTbl1
> SELECT * FROM
> (
> select key, 1 as values from inputTbl1
> union all
> select * FROM (
>   SELECT key, count(1) as values from inputTbl1 group by key
>   UNION ALL
>   SELECT key, 2 as values from inputTbl1
> ) a
> )b;
> select * from outputTbl1 order by key, values;
> {noformat}
> Below is the query result:
> {noformat}
> 1	1
> 1	2
> 2	1
> 2	2
> 3	1
> 3	2
> 7	1
> 7	2
> 8	2
> 8	2
> 8	2
> {noformat}
> 2) 
> {noformat}
> SELECT * FROM
> (
> select key, 1 as values from inputTbl1
> union all
> select * FROM (
>   SELECT key, count(1) as values from inputTbl1 group by key
>   UNION ALL
>   SELECT key, 2 as values from inputTbl1
> ) a
> )b order by key, values;
> {noformat}
> Below is the query result:
> {noformat}
> 1	1
> 1	1
> 1	2
> 2	1
> 2	1
> 2	2
> 3	1
> 3	1
> 3	2
> 7	1
> 7	1
> 7	2
> 8	1
> 8	1
> 8	2
> 8	2
> 8	2
> {noformat}
> Some data is missing in the first set of query result. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message