hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chaoyu Tang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-9534) incorrect result set for query that projects a windowed aggregate
Date Tue, 05 May 2015 17:52:00 GMT

    [ https://issues.apache.org/jira/browse/HIVE-9534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14528903#comment-14528903
] 

Chaoyu Tang commented on HIVE-9534:
-----------------------------------

Actually as of Oracle 11.2 (the version I tested was Oracle Database 11g Express Edition Release
11.2.0.2.0 - 64bit Production), the window function is not supported. Though the query "select
avg(distinct col1) over() from testwindow;", where avg acts on all rows, works, query with
windowing_clause does actually not work. for example:
{code}
Query: 
select avg(distinct col1) over(order by col2 rows between 1 preceding and 1 following) from
testwindow;
---
Error:
ORA-30487: ORDER BY not allowed here
30487. 00000 -  "ORDER BY not allowed here"
*Cause:    DISTINCT functions and RATIO_TO_REPORT cannot have an ORDER BY
{code}
Based on Oracle document http://docs.oracle.com/cd/E11882_01/server.112/e25554/analysis.htm
{code}
Note that the DISTINCT keyword is not supported in windowing functions except for MAX and
MIN.
{code}

Based on Hive plan for "select avg(distinct col1) over() from testwindow;" it looks like Hive
was computing the distinct value of col1:
{code}
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-1
    Map Reduce
      Map Operator Tree:
          TableScan
            alias: testwindow
            Statistics: Num rows: 6 Data size: 18 Basic stats: COMPLETE Column stats: NONE
            Select Operator
              expressions: col1 (type: int)
              outputColumnNames: col1
              Statistics: Num rows: 6 Data size: 18 Basic stats: COMPLETE Column stats: NONE
              Group By Operator
                aggregations: avg(DISTINCT col1)
                keys: col1 (type: int)
                mode: hash
                outputColumnNames: _col0, _col1
                Statistics: Num rows: 6 Data size: 18 Basic stats: COMPLETE Column stats:
NONE
                Reduce Output Operator
                  key expressions: _col0 (type: int)
                  sort order: +
                  Statistics: Num rows: 6 Data size: 18 Basic stats: COMPLETE Column stats:
NONE
      Reduce Operator Tree:
        Group By Operator
          aggregations: avg(DISTINCT KEY._col0:0._col0)
          mode: mergepartial
          outputColumnNames: _col0
          Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE
          File Output Operator
            compressed: false
            Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE
            table:
                input format: org.apache.hadoop.mapred.TextInputFormat
                output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        ListSink
{code}



> incorrect result set for query that projects a windowed aggregate
> -----------------------------------------------------------------
>
>                 Key: HIVE-9534
>                 URL: https://issues.apache.org/jira/browse/HIVE-9534
>             Project: Hive
>          Issue Type: Bug
>          Components: SQL
>            Reporter: N Campbell
>            Assignee: Chaoyu Tang
>
> Result set returned by Hive has one row instead of 5
> {code}
> select avg(distinct tsint.csint) over () from tsint 
> create table  if not exists TSINT (RNUM int , CSINT smallint)
>  ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n' 
>  STORED AS TEXTFILE;
> 0|\N
> 1|-1
> 2|0
> 3|1
> 4|10
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message