hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ning Zhang (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HIVE-900) Map-side join failed if there are large number of mappers
Date Sat, 24 Oct 2009 07:01:59 GMT

    [ https://issues.apache.org/jira/browse/HIVE-900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769602#action_12769602
] 

Ning Zhang commented on HIVE-900:
---------------------------------

distributed cache is definitely one option. It seems it also works on copying file from hdfs:
uri in addition to local directory. However based on the documentation of distributed cache,
the cached file should be copied at the beginning of the mapper task. This may also have the
same network inbound congestion issue if 3000 mappers are trying to copy the same file at
the same time. Or is the distributed cache uses a smarter copying mechanism (like hierarchical
rather than 1:ALL)? Otherwise distributing the jar file will face the same issue. 

> Map-side join failed if there are large number of mappers
> ---------------------------------------------------------
>
>                 Key: HIVE-900
>                 URL: https://issues.apache.org/jira/browse/HIVE-900
>             Project: Hadoop Hive
>          Issue Type: Improvement
>            Reporter: Ning Zhang
>            Assignee: Ning Zhang
>
> Map-side join is efficient when joining a huge table with a small table so that the mapper
can read the small table into main memory and do join on each mapper. However, if there are
too many mappers generated for the map join, a large number of mappers will simultaneously
send request to read the same block of the small table. Currently Hadoop has a upper limit
of the # of request of a the same block (250?). If that is reached a BlockMissingException
will be thrown. That cause a lot of mappers been killed. Retry won't solve but worsen the
problem.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message