hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jothi Padmanabhan (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-4396) sort on 400 nodes is now slower than in 18
Date Fri, 17 Oct 2008 05:26:47 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-4396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12640433#action_12640433
] 

Jothi Padmanabhan commented on HADOOP-4396:
-------------------------------------------

OK, this might be a non issue after all.

All my tests have been with mapred.reduce.parallel.copies=60 and tasktracker.http.threads=100.
This does not appear to be the ideal configuration for the cluster, Runping let me know that
he uses parallel.copies=30 and http.threads=50. With this configuration, sort took the same
time as 18 and gridmix completed in 40+ minutes, which is a reasonable time.

When  reduce.parallel.copies=60 and tasktracker.http.threads=100, it is obvious that towards
the end of the map phase, the load on the disks on the individual nodes is fairly high because
the reducers are pulling in data from a lot more maps in parallel and possibly shuffling them
to disk. This seems to be causing the stragglers that we observed. However, slowing down the
maps by having them write in small chunks seems to somehow mitigate this problem as observed
with both the LocalFileSystem and when breaking down the writes into chunks when using the
RawLocalFileSystem.

> sort on 400 nodes is now slower than in 18
> ------------------------------------------
>
>                 Key: HADOOP-4396
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4396
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.19.0
>            Reporter: Jothi Padmanabhan
>            Assignee: Jothi Padmanabhan
>            Priority: Blocker
>             Fix For: 0.19.0
>
>         Attachments: 4396-v3.patch
>
>
> Sort on 400 nodes on  hadoop release 18 takes about 29 minutes, but with the 19 branch
takes about 32 minutes. This behavior is consistent.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message