hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Owen O'Malley (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-5381) Extend HADOOP-3293 to MapReduce package also
Date Fri, 20 Mar 2009 16:20:50 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-5381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12683922#action_12683922

Owen O'Malley commented on HADOOP-5381:

1. It should just use a map from hostname String to LongWritable to keep track of the lengths
on each node. That will be much clearer. It should not be building topology information here.
There is a lot of totally unmotivated code in this patch.

2. It is not at all clear that picking the top N locations is right, where N is the replication
factor. I think a heuristic that says include the top node and any node within 50% of its
datasize would be more appropriate.

> Extend HADOOP-3293 to MapReduce package also
> --------------------------------------------
>                 Key: HADOOP-5381
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5381
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Jothi Padmanabhan
>            Assignee: Jothi Padmanabhan
>             Fix For: 0.21.0
>         Attachments: hadoop-5381.patch
> HADOOP-3293 made changes to FileInputFormat to identify split locations that contribute
most to the split. This functionality has to be added to the MapReduce.FileInputFormat too.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message