hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pengcheng Xiong (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-10062) HiveOnTez: Union followed by Multi-GB followed by Multi-insert loses data
Date Thu, 09 Apr 2015 21:35:12 GMT

     [ https://issues.apache.org/jira/browse/HIVE-10062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Pengcheng Xiong updated HIVE-10062:
-----------------------------------
    Attachment: HIVE-10062.02.patch

address [~hagleitn]'s comments.

> HiveOnTez: Union followed by Multi-GB followed by Multi-insert loses data
> -------------------------------------------------------------------------
>
>                 Key: HIVE-10062
>                 URL: https://issues.apache.org/jira/browse/HIVE-10062
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Pengcheng Xiong
>            Assignee: Pengcheng Xiong
>            Priority: Critical
>         Attachments: HIVE-10062.01.patch, HIVE-10062.02.patch
>
>
> In q.test environment with src table, execute the following query: 
> {code}
> CREATE TABLE DEST1(key STRING, value STRING) STORED AS TEXTFILE;
> CREATE TABLE DEST2(key STRING, val1 STRING, val2 STRING) STORED AS TEXTFILE;
> FROM (select 'tst1' as key, cast(count(1) as string) as value from src s1
>                          UNION all 
>       select s2.key as key, s2.value as value from src s2) unionsrc
> INSERT OVERWRITE TABLE DEST1 SELECT unionsrc.key, COUNT(DISTINCT SUBSTR(unionsrc.value,5))
GROUP BY unionsrc.key
> INSERT OVERWRITE TABLE DEST2 SELECT unionsrc.key, unionsrc.value, COUNT(DISTINCT SUBSTR(unionsrc.value,5))

> GROUP BY unionsrc.key, unionsrc.value;
> select * from DEST1;
> select * from DEST2;
> {code}
> DEST1 and DEST2 should both have 310 rows. However, DEST2 only has 1 row "tst1    500
    1"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message