hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ning Zhang <nzh...@facebook.com>
Subject Re: [jira] Commented: (HIVE-900) Map-side join failed if there are large number of mappers
Date Sat, 24 Oct 2009 03:09:11 GMT
Yes, that's the plan. You can also try the workaround to remove  
mapjoin hints.

Ning

On Oct 23, 2009, at 7:52 PM, Venky Iyer (JIRA) wrote:

>
>    [ https://issues.apache.org/jira/browse/HIVE-900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769573#action_12769573

>  ]
>
> Venky Iyer commented on HIVE-900:
> ---------------------------------
>
> This is a high-priority bug for me, blocking me on fairly important  
> stuff . The workaround that Dhruba had, of downloading data to the  
> client and adding to the distributedcache is a pretty good solution.
>
>> Map-side join failed if there are large number of mappers
>> ---------------------------------------------------------
>>
>>                Key: HIVE-900
>>                URL: https://issues.apache.org/jira/browse/HIVE-900
>>            Project: Hadoop Hive
>>         Issue Type: Improvement
>>           Reporter: Ning Zhang
>>           Assignee: Ning Zhang
>>
>> Map-side join is efficient when joining a huge table with a small  
>> table so that the mapper can read the small table into main memory  
>> and do join on each mapper. However, if there are too many mappers  
>> generated for the map join, a large number of mappers will  
>> simultaneously send request to read the same block of the small  
>> table. Currently Hadoop has a upper limit of the # of request of a  
>> the same block (250?). If that is reached a BlockMissingException  
>> will be thrown. That cause a lot of mappers been killed. Retry  
>> won't solve but worsen the problem.
>
> -- 
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>


Mime
View raw message