hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Runping Qi (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-2960) A mapper should use some heuristics to decide whether to run the combiner during spills
Date Fri, 07 Mar 2008 14:41:46 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-2960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12576214#action_12576214
] 

Runping Qi commented on HADOOP-2960:
------------------------------------


Uniq key counting can be done as a part of sort. YOu don't need extra computing at  all.

I don't think the overhead of calling combiner can be dismissed.
It does not make sense to call it most keys are unique, which is very common if the number
of reducers is large.


> A mapper should use some heuristics to decide whether to run the combiner during spills
> ---------------------------------------------------------------------------------------
>
>                 Key: HADOOP-2960
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2960
>             Project: Hadoop Core
>          Issue Type: Bug
>            Reporter: Runping Qi
>
> Right now, the combiner, if set, will be called for each spill, no mapper whether the
combiner can actually reduce the values.
> The mapper should use some heuristics to decide whether to run the combiner during spills.
> One of such heuristics is to check the the ratio of  the nymber of keys to the number
of unique keys in the spill.
> The combiner will be called only if that ration exceeds certain threshold (say 2).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message