hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tom White (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HADOOP-3566) Create an InputFormat for reading lines of text as Java Strings
Date Mon, 16 Jun 2008 14:39:44 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-3566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Tom White updated HADOOP-3566:

    Attachment: hadoop-3566.patch

Patch implementing the proposal. The unit test TestJavaSerialization is now Writable-free.
So you can write mappers and reducers like this:

    static class WordCountMapper extends MapReduceBase implements
      Mapper<Long, String, String, Long> {

    public void map(Long key, String value,
        OutputCollector<String, Long> output, Reporter reporter)
        throws IOException {
      StringTokenizer st = new StringTokenizer(value);
      while (st.hasMoreTokens()) {
        output.collect(st.nextToken(), 1L);

  static class SumReducer<K> extends MapReduceBase implements
      Reducer<K, Long, K, Long> {
    public void reduce(K key, Iterator<Long> values,
        OutputCollector<K, Long> output, Reporter reporter)
      throws IOException {

      long sum = 0;
      while (values.hasNext()) {
        sum += values.next();
      output.collect(key, sum);

> Create an InputFormat for reading lines of text as Java Strings
> ---------------------------------------------------------------
>                 Key: HADOOP-3566
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3566
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Tom White
>            Assignee: Tom White
>         Attachments: hadoop-3566.patch
> Such a StringInputFormat would be like TextInputFormat but with input types of Long and
String, rather than LongWritable and Text. This would allow users to write MapReduce programs
that used only Java native types (i.e. no Writables).
> This is currently not possible to write without changes to Hadoop due to a limitation
in the RecordReader interface explained here: https://issues.apache.org/jira/browse/HADOOP-3413?focusedCommentId=12597935#action_12597935

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message