hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ashutosh Chauhan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-3403) user should not specify mapjoin to perform sort-merge bucketed join
Date Sat, 09 Feb 2013 20:57:13 GMT

    [ https://issues.apache.org/jira/browse/HIVE-3403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13575262#comment-13575262
] 

Ashutosh Chauhan commented on HIVE-3403:
----------------------------------------

I concur with Mark's above comments. I don't agree that erring on the side of more configs
is a good idea. e.g., the if-else ladder after this patch will look like following:

{code}
if user specifies map join hint && hive.optimize.bucketmapjoin is true, than a map-join
may be converted to BMJ.

if user specifies map join hint && hive.optimize.bucketmapjoin is true &&
hive.optimize.sortedmerge is true, than a map-join may be converted to SMBJ.

if user doesn't specify map join hint && hive.optimize.bucketmapjoin && hive.optimize.sortedmerge
is true && hive.optimize.auto.convert.sortmerge.join is true, than a regular may be
converted may be converted to SMBJ.

... and than there is hive.auto.covert.join, hive.auto.convert.join.noconditionaltask and
many others...
{code}

instead of simplifying the life of user which I believe is the original goal of jira, we are
making his life complicated by introducing even more config which he needs to understand.
Btw, I am not 100% even if I got the above settings right. Further, the fact that default
value for every optimization is false means user ends up in worst of both worlds where none
of the optimization kicks in and query runs slow. 
To improve from state of art, my suggestions are following:
a) Lets get rid of hints altogether, i.e., we never construct logical plan with a MapJoin/SMBJoin/BJoin
operator but always with regular join operator. And than in optimization phase we convert
regular join to most optimal join implementation depending on sorting/bucketing properties
and sizes of tables. This will simplify the codebase since we always see regular join in our
operator tree in logical phase, thus eliminating need of handling MapJoin operator at logical
level. Also, this simplifies the interaction of hints and configs like user provided hint
but config is off.. kind of scenarios...
b) We should compress all these different configs to lower number of configs.
c) We should set the default value true for all these configs.

Namit, do you think its possible to do this or do you see any problem in this plan? 
                
> user should not specify mapjoin to perform sort-merge bucketed join
> -------------------------------------------------------------------
>
>                 Key: HIVE-3403
>                 URL: https://issues.apache.org/jira/browse/HIVE-3403
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>         Attachments: hive.3403.10.patch, hive.3403.11.patch, hive.3403.12.patch, hive.3403.13.patch,
hive.3403.14.patch, hive.3403.15.patch, hive.3403.16.patch, hive.3403.17.patch, hive.3403.18.patch,
hive.3403.19.patch, hive.3403.1.patch, hive.3403.21.patch, hive.3403.22.patch, hive.3403.23.patch,
hive.3403.24.patch, hive.3403.25.patch, hive.3403.26.patch, hive.3403.2.patch, hive.3403.3.patch,
hive.3403.4.patch, hive.3403.5.patch, hive.3403.6.patch, hive.3403.7.patch, hive.3403.8.patch,
hive.3403.9.patch
>
>
> Currently, in order to perform a sort merge bucketed join, the user needs
> to set hive.optimize.bucketmapjoin.sortedmerge to true, and also specify the 
> mapjoin hint.
> The user should not specify any hints.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message