hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ted Xu (JIRA)" <j...@apache.org>
Subject [jira] Created: (HIVE-1997) Map join followed by multi-table insert will generate duplicated result
Date Thu, 17 Feb 2011 06:39:24 GMT
Map join followed by multi-table insert will generate duplicated result
-----------------------------------------------------------------------

                 Key: HIVE-1997
                 URL: https://issues.apache.org/jira/browse/HIVE-1997
             Project: Hive
          Issue Type: Bug
            Reporter: Ted Xu
             Fix For: 0.7.0


Map join followed by multi-table insert will generate duplicated result, if the insert targets
contain both direct insert (FileSinkOperator logic) and group by/distribute by (ReduceSinkOperator
logic).

The following query regenerate the case:
{code}
FROM
(SELECT /*+ MAPJOIN(x) */ x.key as key1, x.value as value1, y.key as key2, y.value as value2
 FROM src1 x JOIN src y ON (x.key = y.key)) subq
INSERT OVERWRITE TABLE destpart PARTITION (ds='2010-12-12')
SELECT key1, value1
INSERT OVERWRITE TABLE destpart PARTITION (ds='2010-12-13')
SELECT key2, value2
GROUP BY key2, value2;
{code}
In that query above, records of table destpart(ds='2010-12-12') is duplicated.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message