hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benjamin Reed (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-433) Better access to the RecordReader
Date Fri, 11 Aug 2006 20:16:14 GMT
    [ http://issues.apache.org/jira/browse/HADOOP-433?page=comments#action_12427610 ] 
            
Benjamin Reed commented on HADOOP-433:
--------------------------------------

getSplit() would address my need. However, I can imagine in the future that we would like
access to the RecordReader(), especially if we wanted to do sophisticated things like record
skipping using indexed files or random sampling. The MapRunner interface makes these kind
of interesting accesses easy to do if you had access to a RecordReader that happens to implement
a more full featured interface.

A more generic Split interface would be nice as well :)

> Better access to the RecordReader
> ---------------------------------
>
>                 Key: HADOOP-433
>                 URL: http://issues.apache.org/jira/browse/HADOOP-433
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.5.0
>            Reporter: Benjamin Reed
>            Priority: Minor
>
> The record reader has access to the FileSplit which can in turn have information that
is useful to the Mapper. For example, Map processing may vary according to file name or attributes
associated with a file. Unfortunately, even using a MapRunner you only have access to the
progress wrapper of the RecordReader. To get access to the real record reader I had to use
a thread local variable which I set in RecordReader.getNext(). It would be much nicer if you
could get a reference to the real RecordReader from the RecordReader passed to MapRunner.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message