hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ashutosh Chauhan (JIRA)" <>
Subject [jira] [Commented] (HIVE-3403) user should not specify mapjoin to perform sort-merge bucketed join
Date Sun, 10 Feb 2013 04:17:15 GMT


Ashutosh Chauhan commented on HIVE-3403:

Thinking more about my point a) above, there are three potential join optimization opportunities:
a) Convert a JoinOperator to non-bucketed MapJoinOperator.
b) Convert a JoinOperator to bucketed MapJoinOpperator.
c) Convert a JoinOperator to sort-merge-bucketed MapJoinOperator.
Among these c) doesn't need to buffer data in memory, so can be determined completely at compile
time, which this patch enables. a) and b) buffers data in memory so need to be done at run
time. a) is already taken care of in HIVE-3784. 
So, we are left with b) now. With this patch, we will convert a Join Operator to bucketed
MapJoin Operator at compile time by attempting to convert a map-join operator (which will
be there because user provided the hint). But ideally this should also be done at runtime
just like a). At run-time we should see first if tables are bucketed than check if the size
of required buckets of smaller table can fit in memory and if they do than convert a JoinOperator
to BMJ. If table is not bucketed than check size of whole of small table and than convert
it into non-bucketed map-join. If we do this than we can completely get rid of map-join hints.
If we get there, that will be advantageous to users since they never have to provide hints
in their queries, hive optimizer will generate most optimal plan possible. It will be advantageous
to hive devs since they will never have to bother about map-join operators in query compilation
phase because map-join operator will never be part of plan at compile time. It will only appear
at run-time if Join Operator is optimized to MapJoin Operator. This will simplify semantic
analysis, plan generation and compile time optimizations a lot.
Namit, is this analysis correct? 

> user should not specify mapjoin to perform sort-merge bucketed join
> -------------------------------------------------------------------
>                 Key: HIVE-3403
>                 URL:
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>         Attachments: hive.3403.10.patch, hive.3403.11.patch, hive.3403.12.patch, hive.3403.13.patch,
hive.3403.14.patch, hive.3403.15.patch, hive.3403.16.patch, hive.3403.17.patch, hive.3403.18.patch,
hive.3403.19.patch, hive.3403.1.patch, hive.3403.21.patch, hive.3403.22.patch, hive.3403.23.patch,
hive.3403.24.patch, hive.3403.25.patch, hive.3403.26.patch, hive.3403.2.patch, hive.3403.3.patch,
hive.3403.4.patch, hive.3403.5.patch, hive.3403.6.patch, hive.3403.7.patch, hive.3403.8.patch,
> Currently, in order to perform a sort merge bucketed join, the user needs
> to set hive.optimize.bucketmapjoin.sortedmerge to true, and also specify the 
> mapjoin hint.
> The user should not specify any hints.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

View raw message