hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yin Huai (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-2340) optimize orderby followed by a groupby
Date Thu, 17 Jan 2013 21:22:13 GMT

    [ https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13556608#comment-13556608
] 

Yin Huai commented on HIVE-2340:
--------------------------------

Let me explain the reason that I introduced the fake RS operator instead of just removing
the original RS. When I was developing the patch for 2206, I found that the aggregation operator
(GBY) and the join operator (JOIN) use different logic on processing rows forwarded to it.
Although they both buffer rows, a GBY determines if it need to forward results to its children
in processOp. While, a JOIN replies on endGroup to know when it should forward results. When
we have plans like GBY-GBY or JOIN-GBY, that difference on processing logic is fine. However,
when we have plan like
{code}
GBY----                    GBY----
       \                          \
        ----JOIN    or             ----JOIN
       /                          /
GBY----                    JOIN---
{code}
We need operators between the child JOIN and parent GBYs and JOINs to make sure JOIN process
rows in a correct way. This is also the reason that in CorrelationLocalSimulativeReduceSinkOperator,
it determines when to start the group of its children in processOp and leave a empty startGroup
and endGroup.

Also, by replacing RSs with those fake RSs, I do not need to touch those GBYs and JOINs which
will be merged into the same Reduce phase. Since the input of the first operator in the Reduce
side is in the format of [key, value, tag], so I use those fake RSs to generate rows in the
same format.

But this part of work was implemented about almost 2 years ago. Definitely let me know if
anything has been changed and this fake RS is no longer needed.
                
> optimize orderby followed by a groupby
> --------------------------------------
>
>                 Key: HIVE-2340
>                 URL: https://issues.apache.org/jira/browse/HIVE-2340
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Query Processor
>            Reporter: Navis
>            Assignee: Navis
>            Priority: Minor
>              Labels: perfomance
>         Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch,
ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch,
ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.1.patch.txt
>
>
> Before implementing optimizer for JOIN-GBY, try to implement RS-GBY optimizer(cluster-by
following group-by).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message