hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <qwertyman...@gmail.com>
Subject Re: DBInputFormat / DBWritable question
Date Thu, 05 Aug 2010 02:41:05 GMT
AFAIK you don't really need serialization if your job is a map-only
one; the OutputFormat/RecWriter (if any) should take care of it.

On Thu, Aug 5, 2010 at 7:07 AM, David Rosenstrauch <darose@darose.net> wrote:
> I'm working on a M/R job which uses DBInputFormat.  So I have to create my
> own DBWritable for this.  I'm a little bit confused about how to implement
> this though.
> In the sample code in the Javadoc for the DBWritable class, the MyWritable
> implements both DBWritable and Writable - thereby forcing the author of the
> MyWritable class to implement the methods to serialize/deserialize it
> to/from DataInput & DataOutput.  Without getting into too much detail,
> having to implement this serialization would add a good bit of complexity to
> my code.
> However, the DBWritable that I'm writing really doesn't need to exist beyond
> the Mapper.  I.e., it'll be input to the Mapper, but the Mapper won't emit
> it out to the sort/reduce steps.  And after doing some reading/digging
> through the code, it looks to me like the InputFormat and the Mapper always
> get run on the same host & JVM.  If that's in fact the case, then there'd be
> no need for me to make my DBWritable implement Writable also and so I could
> avoid the whole serialization/deserialization issue.
> So my question is basically:  have I got this correct?  Do the InputFormat
> and the Mapper always run in the same VM?  (In which case I can do what I'm
> planning and code the DBWritable without the serialization headaches from
> the Writable class.)
> TIA,
> DR

Harsh J

View raw message