hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Runping Qi (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HADOOP-2093) DFS should provide partition information for blocks, and map/reduce should schedule avoid schedule mappers with the splits off the same file system partition at the same time
Date Tue, 23 Oct 2007 17:41:50 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Runping Qi updated HADOOP-2093:
-------------------------------

    Component/s: mapred
                 dfs
    Description: 
The summary is a bit of long. But the basic idea is to better utilize multiple file system
partitions.
For example, in a map reduce job, if we have 100 splits local to a node, and these 100 splits
spread 
across 4 file system partitions, if we allow 4 mappers running concurrently, it is better
that mappers
each work on splits on different file system partitions. If in the worst case, 
all the mappers work on the splits on the same file system partition, then the other three

file systems are not utilized at all.



  was:

The summary is a bit of long. But the basic idea is to better utilize multiple file system
partitions.
For example, in a map reduce job, if we have 100 splits local to a node, and these 100 splits
spread 
across 4 file system partitions, if we allow 4 mappers running concurrently, it is better
that mappers
each work on splits on different file system partitions. If in the worst case, 
all the mappers work on the splits on the same file system partition, then the other three

file systems are not utilized at all.




> DFS should provide partition information for blocks, and map/reduce should schedule avoid
schedule mappers with the splits off the same file system partition at the same time
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-2093
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2093
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: dfs, mapred
>            Reporter: Runping Qi
>
> The summary is a bit of long. But the basic idea is to better utilize multiple file system
partitions.
> For example, in a map reduce job, if we have 100 splits local to a node, and these 100
splits spread 
> across 4 file system partitions, if we allow 4 mappers running concurrently, it is better
that mappers
> each work on splits on different file system partitions. If in the worst case, 
> all the mappers work on the splits on the same file system partition, then the other
three 
> file systems are not utilized at all.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message