hadoop-mapreduce-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "jay vyas (JIRA)" <j...@apache.org>
Subject [jira] [Created] (MAPREDUCE-5511) Multifilewc and the mapred.* API: Is the use of getPos() valid?
Date Mon, 16 Sep 2013 15:35:54 GMT
jay vyas created MAPREDUCE-5511:

             Summary: Multifilewc and the mapred.* API:  Is the use of getPos() valid?
                 Key: MAPREDUCE-5511
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5511
             Project: Hadoop Map/Reduce
          Issue Type: Bug
          Components: examples
            Reporter: jay vyas
            Priority: Minor

The MultiFileWordCount class in the hadoop examples libraries uses a record reader which switches
between files.  This behaviour can cause the RawLocalFileSystem to break in a concurrent environment
because of the way buffering works (in RawLocalFileSystem, switching between streams results
in a temproraily "null" inner stream, and that inner stream is called by the getPos() implementation
in the custom RecordReader for MultiFileWordCount). 

There are basically 2 ways to handle this:

1) Wrap the getPos() implementation in the object returned by open() in the RawLocalFileSystem
to cache the value of getPos() everytime it is called, so that calls to getPos() can return
a valid long even if underlying stream is null. OR

2) Update the RecordReader in multifilewc to not rely on the inner input stream and cache
the position / return 0 if the stream cannot return a valid value. 

The final question here is:  Is the RecordReader for MultiFileWordCount doing the right thing
?  Or is it breaking the contract of getPos()... and really... what SHOULD getPos() return
if the underlying stream has already been consumed? 

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message