hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paul Sutter" <sut...@gmail.com>
Subject Re: [jira] Commented: (HADOOP-433) Better access to the RecordReader
Date Wed, 09 Aug 2006 22:27:55 GMT
just fyi, here are the reasons we tend to encounter:

- regex'd metadata from a filename (for example, date, data source, etc...
anything that exists on a per-file basis and not embedded in every record)
- error message ("illegal widget name", "file XXX"), usually in an output
record which is later reduced
- etc...

each of these are handled just fine by having the filename in the jobconf.

im sure there are other purposes, but these are what we run into.

On 8/9/06, Owen O'Malley <owen@yahoo-inc.com> wrote:
> On Aug 9, 2006, at 12:21 PM, Eric Baldeschwieler wrote:
> > Why not provide a pointer to the real record reader?  Seems like a
> > valid OO way to get access to all kinds of things.
> Those attributes were put in to the JobConf so that Hadoop could re-
> run an isolated task, so they had to be serializable. Putting real
> objects into the JobConf breaks that property.
> Ben hasn't explained why he wants the RecordReader, so I was trying
> to guess. The problem with giving out references to the RecordReader
> is that you are exposing the framework's implementation details. In
> particular, all you can really do to a record reader is advance it.
> That really isn't something that the Mapper should be doing.
> -- Owen

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message