hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jothi Padmanabhan (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HADOOP-5698) Change org.apache.hadoop.examples.MultiFileWordCount to use new mapreduce api.
Date Thu, 30 Apr 2009 06:54:30 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-5698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jothi Padmanabhan updated HADOOP-5698:
--------------------------------------

    Status: Open  (was: Patch Available)

I think it would be a better idea not to club the fixes to the existing mapred.CombineInputFormat
into this Jira. That should be addressed in a separate Jira and the patch for this should
be built on top of that.

Some other points:
# MultiFileWordCount -- I do not think we should use the MultiFileLineRecordReader to read
from a CombineSplit. It is guaranteed to work only if the start offset is 0, which is not
necessarily true. Instead the CombineFileRecordReader should be used
# Minor -- Why is there a return 2 in run (instead of return 1 as in existing code)
# CombineFileInputFormat.createRecordReader -- should this just return null or should it call
super.createRecordReader ?
# Minor -- CombineFileRecordReader -- Remove unused exports
# Minor -- Where ever possible, keep the code/comments restricted to 80 columns

> Change org.apache.hadoop.examples.MultiFileWordCount to use new mapreduce api.
> ------------------------------------------------------------------------------
>
>                 Key: HADOOP-5698
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5698
>             Project: Hadoop Core
>          Issue Type: Sub-task
>            Reporter: Amareshwari Sriramadasu
>            Assignee: Amareshwari Sriramadasu
>         Attachments: patch-5698.txt
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message