hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benjamin Reed (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-433) Better access to the RecordReader
Date Wed, 09 Aug 2006 14:21:18 GMT
    [ http://issues.apache.org/jira/browse/HADOOP-433?page=comments#action_12426928 ] 
Benjamin Reed commented on HADOOP-433:

If I understand correctly, you are suggesting that I instantiate another RecordReader using
the information in JobConf. If I do that, progress will not be updated correctly unless I
also do that myself. At that point it is easier to just do the ThreadLocal variable hack I
currently use to get the real RecordReader. In both cases you are looking at just a few lines
of code, but it is ugly-silly-slightly-confusing code.

> Better access to the RecordReader
> ---------------------------------
>                 Key: HADOOP-433
>                 URL: http://issues.apache.org/jira/browse/HADOOP-433
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.5.0
>            Reporter: Benjamin Reed
>            Priority: Minor
> The record reader has access to the FileSplit which can in turn have information that
is useful to the Mapper. For example, Map processing may vary according to file name or attributes
associated with a file. Unfortunately, even using a MapRunner you only have access to the
progress wrapper of the RecordReader. To get access to the real record reader I had to use
a thread local variable which I set in RecordReader.getNext(). It would be much nicer if you
could get a reference to the real RecordReader from the RecordReader passed to MapRunner.

This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message