hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jothi Padmanabhan (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HADOOP-4683) Move the call to getMapCompletionEvents in ReduceTask.ReduceCopier.fetchOutputs to a separate thread
Date Fri, 12 Dec 2008 05:04:44 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-4683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Jothi Padmanabhan updated HADOOP-4683:

    Attachment: hadoop-4683.patch

Attaching a patch. 

A 100 node, 100 byte, 100K maps loadgen showed a 3x performance improvement (~800 seconds
with patch, ~2500 seconds without the patch)
bin/hadoop jar hadoop-$BUILD-test.jar loadgen \
-D test.randomtextwrite.bytes_per_map=$((100)) \
-D test.randomtextwrite.total_bytes=$((100*100000)) \
-D mapred.compress.map.output=false \
-r 1 \
-outKey org.apache.hadoop.io.Text \
-outValue org.apache.hadoop.io.Text \
-outFormat org.apache.hadoop.mapred.lib.NullOutputFormat \
-outdir fakeout

Testpatch results:

     [exec] -1 overall.  
     [exec]     +1 @author.  The patch does not contain any @author tags.
     [exec]     -1 tests included.  The patch doesn't appear to include any new or modified
     [exec]                         Please justify why no tests are needed for this patch.
     [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
     [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler
     [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.
     [exec]     +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

> Move the call to getMapCompletionEvents in ReduceTask.ReduceCopier.fetchOutputs to a
separate thread
> ----------------------------------------------------------------------------------------------------
>                 Key: HADOOP-4683
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4683
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Devaraj Das
>            Assignee: Jothi Padmanabhan
>             Fix For: 0.20.0
>         Attachments: hadoop-4683.patch
> The method ReduceTask.ReduceCopier.fetchOutputs makes a call to getMapCompletionEvents
every iteration of the loop. This should be moved out to a separate thread. This might slow
down the shuffle scheduler in some cases since there is a sleep inside the getMapCompletionEvents

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message