hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Runping Qi (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-450) Remove the need for users to specify the types of the inputs
Date Mon, 14 Aug 2006 23:04:14 GMT
    [ http://issues.apache.org/jira/browse/HADOOP-450?page=comments#action_12427980 ] 
            
Runping Qi commented on HADOOP-450:
-----------------------------------


It seems to me that it is better to add getInputKeyClass/ValueClass methods to RecordReader,
since creating an object of these class needs a 
reference to JobConf object and RecordReader does not have such a reference.
The MapRunner class can create the key/value objects by calling ReflectionUtil. MapRunner
would look like:

  public void run(RecordReader input, OutputCollector output,
                  Reporter reporter)
    throws IOException {
    try {
      // allocate key & value instances that are re-used for all entries
        this.inputKeyClass = input.getKeyClass();
        this.inputValueClass = input.getValueClass();
      
        WritableComparable key =
          (WritableComparable)ReflectionUtils.newInstance(inputKeyClass, job);
        Writable value = (Writable)ReflectionUtils.newInstance(inputValueClass,
                                                             job);
        
        Class mapperClass = job.getMapperClassFor(this.inputKeyClass, this.inputValueClass);
        this.mapper = (Mapper)ReflectionUtils.newInstance(mapperClass, job);
        while (input.next(key, value)) {
        // map pair to output
        mapper.map(key, value, output, reporter);
      }
    } finally {
        mapper.close();
    }
  }

In the above code, the mapper class is obtained from the job object 
through a new method of JobConf class:
Class getMapperClassFor(this.inputKeyClass, this.inputValueClass);

With these, and a few other minor changes, this patch will address jira issue 372 as well.
The other changes include:

    add to JobConf class the following methods:
           public InputFormat getInputFormat(Path p) 
           public void setInputFormat(Class theClass, Path p)
           public Class getMapperClassFor(Class keyClass, Class valueClass) {
           public void setMapperClassFor(Class theClass, Class keyClass, Class valueClass)
   replace thefollowing in MapTask class:
      final RecordReader rawIn =                  // open input
        job.getInputFormat().getRecordReader
        (FileSystem.get(job), split, job, reporter);
  with 
      final RecordReader rawIn =                  // open input
        job.getInputFormat(split.getPath()).getRecordReader
        (FileSystem.get(job), split, job, reporter);
 

With these changes, the application will specify a MapReduce job in the following way:
            
        Configuration defaults = new Configuration();
        JobConf theJob = new JobConf(defaults, My.class);
        
        theJob.addInputPath(myInputPath_1)
        theJob.setInputFormat(SequenceFileInputFormat, myInputPath_1);

        theJob.addInputPath(myInputPath_2)
        theJob.setInputFormat(TextInputFormat, myInputPath_2);

        theJob.addInputPath(myInputPath_3)
        theJob.setInputFormat(SequenceFileInputFormat, myInputPath_3);

        theJob.setMapperClassFor(MapperClass_a, LongWritable.class, Text.class);
        theJob.setMapperClassFor(MapperClass_b, Key_1.class, Value_1.class);
        theJob.setMapperClassFor(MapperClass_c, Key_2.class, Value_2.class);

        ....



 


> Remove the need for users to specify the types of the inputs
> ------------------------------------------------------------
>
>                 Key: HADOOP-450
>                 URL: http://issues.apache.org/jira/browse/HADOOP-450
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.5.0
>            Reporter: Owen O'Malley
>         Assigned To: Owen O'Malley
>             Fix For: 0.6.0
>
>
> Currently, the application specifies the types of the input keys and values and the RecordReader
checks them for consistency. It would make more sense to have the RecordReader define the
types of keys that it will produce. Therefore, I propose that we add two new methods to RecordReader:
> WritableComparable createKey();
> Writable createValue();
> Note that I propose adding them to the RecordReader rather than the InputFormat, so that
they can specific to a particular input split.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message