hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hairong Kuang (JIRA)" <j...@apache.org>
Subject [jira] Updated: (MAPREDUCE-1981) Improve getSplits performance by using listFiles, the new FileSystem API
Date Tue, 17 Aug 2010 23:23:17 GMT

     [ https://issues.apache.org/jira/browse/MAPREDUCE-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Hairong Kuang updated MAPREDUCE-1981:

    Attachment: mapredListFiles4.patch

This patch fixed two failed unit tests: TestCombineFileInputFormat and TestHarFileSystem.

For the first test, I found out that my patch made a subtle change to CombinFileInputFormat.
The path filter in CombineFileInputFormat assumes that the path to the filter does not include
the schema and hostname etc. This is different from other input formats where they do not
have any assumption on Path format. My patch removes this restriction so I have to modify
DummyFileInputFormat in TestCombineFileInputFormat not to have this assumption.

For the second test, it turns out HarFileSystem does not have a correct implementation of
listLocatedStatus. So this patch adds one.

> Improve getSplits performance by using listFiles, the new FileSystem API
> ------------------------------------------------------------------------
>                 Key: MAPREDUCE-1981
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1981
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: job submission
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>             Fix For: 0.22.0
>         Attachments: mapredListFiles.patch, mapredListFiles1.patch, mapredListFiles2.patch,
mapredListFiles3.patch, mapredListFiles4.patch
> This jira will make FileInputFormat and CombinedFileInputForm to use the new API, thus
reducing the number of RPCs to HDFS NameNode.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message