hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jet Guo (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HIVE-9052) Missing grouping rows when multi-insert
Date Tue, 09 Dec 2014 10:03:12 GMT
Jet Guo created HIVE-9052:
-----------------------------

             Summary: Missing grouping rows when multi-insert
                 Key: HIVE-9052
                 URL: https://issues.apache.org/jira/browse/HIVE-9052
             Project: Hive
          Issue Type: Bug
    Affects Versions: 0.14.0
            Reporter: Jet Guo


Giving a table and data as below:

create table score (class string, student string, score int) ROW FORMAT DELIMITED FIELDS TERMINATED
BY ',' ;
------------------Data---------------
class1,Jack,7
class1,Mike,8
class2,Tom,7

The HQL 'from score INSERT OVERWRITE DIRECTORY '/tmp/dpp/hql1' select class,student , count(score)
group by class, student grouping sets ((class), (class,student)) ' 
will get result like :
----------hql1--------------
class1\N2
class1Jack1
class1Mike1
class2\N1
class2Tom1

And the HQL 'from score INSERT OVERWRITE DIRECTORY '/tmp/dpp/hql2' select class,student ,
sum(score)   group by class, student grouping sets ((class), (class,student)) '
will get result like :
----------hql2--------------
class1\N15
class1Jack7
class1Mike8
class2\N7
class2Tom7



But, if you run the HQL with above two inserts, 'from score INSERT OVERWRITE DIRECTORY '/tmp/dpp/hql1'
select class,student , count(score) group by class, student grouping sets ((class), (class,student))
INSERT OVERWRITE DIRECTORY '/tmp/dpp/hql2' select class,student , sum(score)   group by class,
student grouping sets ((class), (class,student))'
, the results will miss some grouping rows as below:

----------hql1--------------
class1Jack1
class1Mike1
class2Tom1

----------hql2--------------
class1Jack7
class1Mike8
class2Tom7






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message