hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Teddy Choi (JIRA)" <>
Subject [jira] [Updated] (HIVE-17896) TopNKey: Create a standalone vectorizable TopNKey operator
Date Wed, 03 Jan 2018 07:48:03 GMT


Teddy Choi updated HIVE-17896:
    Attachment: HIVE-17896.5.patch

This fifth patch fixed TestDanglingQOuts failure. I also tested auto_sortmerge_join_2.q and
lateral_view_ppd.q tests and they passed. It looks like they are unrelated with this patch.

> TopNKey: Create a standalone vectorizable TopNKey operator
> ----------------------------------------------------------
>                 Key: HIVE-17896
>                 URL:
>             Project: Hive
>          Issue Type: New Feature
>          Components: Operators
>    Affects Versions: 3.0.0
>            Reporter: Gopal V
>            Assignee: Teddy Choi
>         Attachments: HIVE-17896.1.patch, HIVE-17896.3.patch, HIVE-17896.4.patch, HIVE-17896.5.patch
> For TPC-DS Query27, the TopN operation is delayed by the group-by - the group-by operator
buffers up all the rows before discarding the 99% of the rows in the TopN Hash within the
ReduceSink Operator.
> The RS TopN operator is very restrictive as it only supports doing the filtering on the
shuffle keys, but it is better to do this before breaking the vectors into rows and losing
the isRepeating properties.
> Adding a TopN Key operator in the physical operator tree allows the following to happen.
> GBY->RS(Top=1)
> can become 
> TNK(1)->GBY->RS(Top=1)
> So that, the TopNKey can remove rows before they are buffered into the GBY and consume
> Here's the equivalent implementation in Presto
> Adding this as a sub-feature of GroupBy prevents further optimizations if the GBY is
on keys "a,b,c" and the TopNKey is on just "a".

This message was sent by Atlassian JIRA

View raw message