hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "He Yongqiang (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HIVE-1093) Add a "skew join map join size" variable to control the input size of skew join's following map join job.
Date Wed, 27 Jan 2010 19:47:34 GMT

    [ https://issues.apache.org/jira/browse/HIVE-1093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12805629#action_12805629
] 

He Yongqiang commented on HIVE-1093:
------------------------------------

>>does it work for combinehiveinputsplit also ?
No. We should not use combine inputformat for this. CombineFileInputFormat use block size
as the minimum split size. We need to explicitly specify the second job to use HiveInputFormat.
Will update the patch to "explicitly specify the second job to use HiveInputFormat".

> Add a "skew join map join size" variable to control the input size of skew join's following
map join job.
> ---------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-1093
>                 URL: https://issues.apache.org/jira/browse/HIVE-1093
>             Project: Hadoop Hive
>          Issue Type: Improvement
>            Reporter: He Yongqiang
>            Assignee: He Yongqiang
>         Attachments: hive-1093.patch
>
>
> In a test, many skew join key itself >250M size. And the following mapjoin will take
several hours to do a mapjoin for those big skew keys. 
> This can be better by using a small map input size for the following map join job.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message