hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Devaraj Das (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-2771) changing the number of reduces dramatically changes the time of the map time
Date Thu, 24 Jul 2008 20:11:32 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-2771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12616622#action_12616622
] 

Devaraj Das commented on HADOOP-2771:
-------------------------------------

Christian, how big are the map outputs? What's the value of io.sort.mb? This will give a rough
estimate on the number of spills a map does. If there are multiple spills while the map is
running, one suspect is that we are spending too much time in the final merge across the spill
segments (merges to produce the final map output). We'd also do a lot of seeks during that
merge. It'd be nice to remove the seeks, before the merge of the segments belonging to a certain
partition, since we are ultimately reading the files sequentially anyway.

But again, the above depends on whether the map does multiple spills or not. 

> changing the number of reduces dramatically changes the time of the map time
> ----------------------------------------------------------------------------
>
>                 Key: HADOOP-2771
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2771
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.17.1
>            Reporter: Owen O'Malley
>
> By changing the number of reduces, the time for an individual map changes radically.
By running the same program and data with different numbers of reduces (2500, 7500, 25000)
the times for each map changed radically (0:50, 1:20, 5h).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message