hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Navis (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-4809) ReduceSinkOperator of PTFOperator can have redundant key columns
Date Sat, 17 Jan 2015 03:09:34 GMT

     [ https://issues.apache.org/jira/browse/HIVE-4809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Navis updated HIVE-4809:
------------------------
    Attachment: HIVE-4809.1.patch.txt

Mostly done by HIVE-4867. 

> ReduceSinkOperator of PTFOperator can have redundant key columns
> ----------------------------------------------------------------
>
>                 Key: HIVE-4809
>                 URL: https://issues.apache.org/jira/browse/HIVE-4809
>             Project: Hive
>          Issue Type: Improvement
>          Components: PTF-Windowing
>    Affects Versions: 0.11.0
>            Reporter: Yin Huai
>            Assignee: Yin Huai
>         Attachments: HIVE-4809.1.patch.txt
>
>
> For example, we have a simple query like this ...
> {code:sql}
> SELECT x.a, x.b, count(x.b) OVER (PARTITION BY x.a) FROM src x;
> {\code}
> The plan of it is ...
> {code}
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 is a root stage
> STAGE PLANS:
>   Stage: Stage-1
>     Map Reduce
>       Alias -> Map Operator Tree:
>         x 
>           TableScan
>             alias: x
>             Reduce Output Operator
>               key expressions:
>                     expr: a
>                     type: int
>                     expr: a
>                     type: int
>               sort order: ++
>               Map-reduce partition columns:
>                     expr: a
>                     type: int
>               tag: -1
>               value expressions:
>                     expr: a
>                     type: int
>                     expr: b
>                     type: string
>       Reduce Operator Tree:
>         Extract
>           PTF Operator
>             Select Operator
>               expressions:
>                     expr: _col0
>                     type: int
>                     expr: _col1
>                     type: string
>                     expr: _wcol0
>                     type: bigint
>               outputColumnNames: _col0, _col1, _col2
>               File Output Operator
>                 compressed: false
>                 GlobalTableId: 0
>                 table:
>                     input format: org.apache.hadoop.mapred.TextInputFormat
>                     output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>   Stage: Stage-0
>     Fetch Operator
>       limit: -1
> {\code}
> The ReduceSinkOperator has two "a" in its key columns. This redundancy can increase the
size of map output.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message