hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hemanth Yamijala (JIRA)" <j...@apache.org>
Subject [jira] Updated: (MAPREDUCE-1466) FileInputFormat should save #input-files in JobConf
Date Sun, 14 Feb 2010 08:44:28 GMT

     [ https://issues.apache.org/jira/browse/MAPREDUCE-1466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Hemanth Yamijala updated MAPREDUCE-1466:

    Attachment: MAPREDUCE-1466_yhadoop20-1.patch

Minor changes to the earlier patch in the newly attached one:

- Removed a System.err println in the old FileInputFormat. Please note that the same data
(about number of paths to process) is available via a log statement in getSplits as well.
- Removed a duplicate call to listStatus in the new FileInputFormat, which was like this:
+    List<FileStatus>files = listStatus(job);
     for (FileStatus file: listStatus(job)) {

I also suppose we need testcases for the new API. However, there are no tests for any of the
classes in the org.apache.hadoop.mapreduce.lib.input package. So possibly this should be a
separate JIRA.

Please let me know if the changes seem fine.

> FileInputFormat should save #input-files in JobConf
> ---------------------------------------------------
>                 Key: MAPREDUCE-1466
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1466
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: client
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
>            Priority: Minor
>             Fix For: 0.22.0
>         Attachments: MAPREDUCE-1466_yhadoop20-1.patch, MAPREDUCE-1466_yhadoop20.patch
> We already track the amount of data consumed by MR applications (MAP_INPUT_BYTES), alongwith,
it would be useful to #input-files from the client-side for analysis. Along the lines of MAPREDUCE-1403,
it would be easy to stick in the JobConf during job-submission.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message