hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Carl Steinbach (JIRA)" <j...@apache.org>
Subject [jira] Resolved: (HIVE-1723) The result of left semi join is not correct
Date Mon, 07 Mar 2011 18:57:00 GMT

     [ https://issues.apache.org/jira/browse/HIVE-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Carl Steinbach resolved HIVE-1723.
----------------------------------

    Resolution: Duplicate

> The result of left semi join is not correct
> -------------------------------------------
>
>                 Key: HIVE-1723
>                 URL: https://issues.apache.org/jira/browse/HIVE-1723
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Liyin Tang
>            Assignee: Liyin Tang
>
> In the test case semijoin.q, there is a query:
> select /*+ mapjoin(b) */ a.key from t3 a left semi join t1 b on a.key = b.key sort by
a.key;
> I think this query will return a wrong result if table t1 is larger than 25000 different
keys
> To be simple, I tried a very similar query:
> select /*+ mapjoin(b) */ a.key from test_semijoin a left semi join test_semijoin b on
a.key = b.key sort by a.key;
> The table of test_semijoin is like
> 0     0
> 1     1
> 2     2
> 3     3
> 4     4
> 5     5
> ...    ...
> ...          ....
> 25000   25000
> 25001   25001
> ...          ....
> ...          ....
> 25999   25999
> 26000   26000
> So we can easily estimate the correct result of this query should be the same keys from
table test_semijoin itsel.
> Actually, the result is only part of that: only from 0 to 24544.
> 0
> 1
> 2
> ..
> ..
> 24543
> 24544

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message