hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xuefu Zhang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-15239) hive on spark combine equivalentwork get wrong result because of tablescan operation compare
Date Thu, 01 Dec 2016 01:11:58 GMT

    [ https://issues.apache.org/jira/browse/HIVE-15239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15710435#comment-15710435
] 

Xuefu Zhang commented on HIVE-15239:
------------------------------------

Sorry for the delay.

Re: my point #1, I was referring to this:
{code}
      Set<Operator<?>> firstRootOperators = first.getAllRootOperators();
      Set<Operator<?>> secondRootOperators = second.getAllRootOperators();
      if (firstRootOperators.size() != secondRootOperators.size()) {
        return false;
      }

      // need to check paths and partition desc for MapWorks
      if (first instanceof MapWork && !compareMapWork((MapWork) first, (MapWork) second))
{
        return false;
      }
{code}
I think it's better to be like the following in order to logical unit of code together.
{code}
      // need to check paths and partition desc for MapWorks
      if (first instanceof MapWork && !compareMapWork((MapWork) first, (MapWork) second))
{
        return false;
      }

      Set<Operator<?>> firstRootOperators = first.getAllRootOperators();
      Set<Operator<?>> secondRootOperators = second.getAllRootOperators();
      if (firstRootOperators.size() != secondRootOperators.size()) {
        return false;
      }
{code}

As to exhaustive check, your fix will solve the problem describe here. I would even believe
there is a possibility that there are two two mapwork that works on different partitions of
the same table, such as in case of union.

Overall, I feel more testing is needed for this feature. Of course this goes beyond the scope
of this JIRA.



> hive on spark combine equivalentwork get wrong result because of  tablescan operation
compare
> ---------------------------------------------------------------------------------------------
>
>                 Key: HIVE-15239
>                 URL: https://issues.apache.org/jira/browse/HIVE-15239
>             Project: Hive
>          Issue Type: Bug
>          Components: Spark
>    Affects Versions: 1.2.0, 2.1.0
>            Reporter: wangwenli
>            Assignee: Rui Li
>         Attachments: HIVE-15239.1.patch, HIVE-15239.2.patch
>
>
> env: hive on spark engine
> reproduce step:
> {code}
> create table a1(KEHHAO string, START_DT string) partitioned by (END_DT string);
> create table a2(KEHHAO string, START_DT string) partitioned by (END_DT string);
> alter table a1 add partition(END_DT='20161020');
> alter table a1 add partition(END_DT='20161021');
> insert into table a1 partition(END_DT='20161020') values('2000721360','20161001');
> SELECT T1.KEHHAO,COUNT(1) FROM ( 
> SELECT KEHHAO FROM a1 T 
> WHERE T.KEHHAO = '2000721360' AND '20161018' BETWEEN T.START_DT AND T.END_DT-1 
> UNION ALL 
> SELECT KEHHAO FROM a2 T
> WHERE T.KEHHAO = '2000721360' AND '20161018' BETWEEN T.START_DT AND T.END_DT-1 
> ) T1 
> GROUP BY T1.KEHHAO 
> HAVING COUNT(1)>1; 
> +-------------+------+--+
> |  t1.kehhao  | _c1  |
> +-------------+------+--+
> | 2000721360  | 2    |
> +-------------+------+--+
> {code}
> the result should be none record



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message