hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Arun C Murthy (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HADOOP-2437) final map output not evenly distributed across multiple disks
Date Wed, 19 Dec 2007 09:12:43 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-2437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Arun C Murthy updated HADOOP-2437:
----------------------------------

    Status: Patch Available  (was: Open)

> final map output not evenly distributed across multiple disks
> -------------------------------------------------------------
>
>                 Key: HADOOP-2437
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2437
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.16.0
>            Reporter: Christian Kunz
>            Assignee: Arun C Murthy
>            Priority: Blocker
>             Fix For: 0.15.2
>
>         Attachments: HADOOP-2437_1_20071218.patch, HADOOP-2437_1_20071218.patch
>
>
> It seems that the final merge output of map tasks for a particular job does not select
the output location in random fashion.
> This results in a job with a lot of map tasks eventually running out of taskTrackers
asking for more tasks because the disk with most of the map outputs eventually has less disk
space than specified by mapred.local.dir.minspacestart.
> Maybe the start of round-robin selection of multiple locations should be randomized.
> In our case:
> 110,000 maps, each about 3GB final output, on a 1300 node cluster.
> Out of 4 locations and after processing about 79,000 maps, the selection for final map
outputs 'file.out' looked like:
> location1: 24,000
> location2: 25
> location3: 55,000
> location4: 7

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message