hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Owen O'Malley (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-2960) A mapper should use some heuristics to decide whether to run the combiner during spills
Date Fri, 07 Mar 2008 06:46:58 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-2960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12576057#action_12576057
] 

Owen O'Malley commented on HADOOP-2960:
---------------------------------------

Computing the number of unique keys is not free. In particular, it will cost O(N) comparisons.
Even worse, this doesn't scale. Currently, combiners are only applied on the original spill,
where your approach could be done. However, we plan to apply combiners every time you write
to disk during the merge sort. There, you certainly can't count the duplicated keys without
a prohibitive cost. 

-1

Once we have HADOOP-2399, almost any reduction in the cost of network and disk i/o should
be worth the cost of the combiner.

> A mapper should use some heuristics to decide whether to run the combiner during spills
> ---------------------------------------------------------------------------------------
>
>                 Key: HADOOP-2960
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2960
>             Project: Hadoop Core
>          Issue Type: Bug
>            Reporter: Runping Qi
>
> Right now, the combiner, if set, will be called for each spill, no mapper whether the
combiner can actually reduce the values.
> The mapper should use some heuristics to decide whether to run the combiner during spills.
> One of such heuristics is to check the the ratio of  the nymber of keys to the number
of unique keys in the spill.
> The combiner will be called only if that ration exceeds certain threshold (say 2).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message