hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ashutosh Chauhan (JIRA)" <>
Subject [jira] [Commented] (HIVE-2340) optimize orderby followed by a groupby
Date Thu, 17 Jan 2013 19:36:28 GMT


Ashutosh Chauhan commented on HIVE-2340:

Yeah, correct JOIN-GBY and GBY-GBY are taken care of in ysmart also. Its the group-by followed
by order-by case which is also of interest to me, which this already covers. 

Besides the scenario covered by these two patches, I am also comparing the approaches taken
in these two. I have just briefly looked at this patch, but fundamental difference which I
can make out in this approach Vs ysmart approach is that here RS is deduplicated that is completely
removed from operator pipeline, wherever it could be (i.e. when keys of subsequent RS is superset
of the earlier one) thus fusing multiple MR jobs. Ysmart on the other hand instead replaces
the second RS with a new operator its introducing (LocalSimulatedReduceSink?) which fakes
the RS but doesn't let the plan split in 2 MR jobs and thus generating one MR job. I haven't
thought through completely on this, but on initial pass it seems like approach of this patch
is better than ysmart because:
* Here you don't need a new operator.
* Here you are simplifying the plan by eliminating the operators as oppose to ysmart which
is replacing the operator thereby increasing the complexity of plan (by having a new type
of operator)
* In that new operator ysmart currently serializes and deserializes the data through that
operator, thereby unnecessarily introducing performance penalty. Granted this could be improved,
but problem doesn't exist in patch proposed on this jira to begin with. 

Though there are certainly other scenarios which ysmart can cover (Yin, can you list those)
which this patch is not covering, but for the scenarios that are common this approach seems
to be better. 

There might be other differences in the approach, please feel free to raise those.
> optimize orderby followed by a groupby
> --------------------------------------
>                 Key: HIVE-2340
>                 URL:
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Query Processor
>            Reporter: Navis
>            Assignee: Navis
>            Priority: Minor
>              Labels: perfomance
>         Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch,
ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.1.patch.txt
> Before implementing optimizer for JOIN-GBY, try to implement RS-GBY optimizer(cluster-by
following group-by).

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

View raw message