hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Baldeschwieler <eri...@yahoo-inc.com>
Subject Re: [jira] Commented: (HADOOP-433) Better access to the RecordReader
Date Wed, 09 Aug 2006 23:21:44 GMT
Advancing the reader sounds like "a bad idea".

But an exotic reader might have all kinds of context it could  
publish.  maybe current line number, rowID, SQL statement used...   
Who knows.  There could be lots of stuff.

It would be nice to have an interface that lets you get to any  
methods your subclassed reader has decided to publish.

Pushing this through the config doesn't seem right.  Having an  
available method a mapper can invoke does.

On Aug 9, 2006, at 2:46 PM, Owen O'Malley wrote:

> On Aug 9, 2006, at 12:21 PM, Eric Baldeschwieler wrote:
>> Why not provide a pointer to the real record reader?  Seems like a  
>> valid OO way to get access to all kinds of things.
> Those attributes were put in to the JobConf so that Hadoop could re- 
> run an isolated task, so they had to be serializable. Putting real  
> objects into the JobConf breaks that property.
> Ben hasn't explained why he wants the RecordReader, so I was trying  
> to guess. The problem with giving out references to the  
> RecordReader is that you are exposing the framework's  
> implementation details. In particular, all you can really do to a  
> record reader is advance it. That really isn't something that the  
> Mapper should be doing.
> -- Owen

View raw message