hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Devajyoti Sarkar <dsar...@q-kk.com>
Subject Re: RecordReader Key/Value classes
Date Wed, 29 Jul 2009 08:37:42 GMT
Thank you for a great tip - reusing the key/value objects after
output.collect.

I have one more question. Is the map output data stored on the local disk of
the instance or is it written out to HDFS. Specifically, if a single map
outputs more data than the storage size of its local disk, does the job fail
(or can one assume one has the full space of the disk available in HDFS)?

Cheers,
Dev


On Wed, Jul 29, 2009 at 10:06 AM, Jason Venner <jason.hadoop@gmail.com>wrote:

> In hadoop 18 and beyond, the key and value do not have to Implement
> Writable.
> As a general rule, the key and value objects passed to the map task will be
> the same objects, with a fresh value initialized by the record reader.
> The output.collect method will serialize the value during the call (unless
> you are using the chainmapping from 19+), and you are free to reset the
> values stored in the key value objects passed to output.collect after the
> call.
>
> It is a common practice to have a class field containing an object instance
> of the output key or value type, which are used for transformations,
> instead
> of allocating a new key or value instance in each call to map or reduce.
>
> On Tue, Jul 28, 2009 at 11:29 AM, Devajyoti Sarkar <dsarkar@q-kk.com>
> wrote:
>
> > Thanks.
> >
> > Dev
> >
> > On Wed, Jul 29, 2009 at 2:27 AM, Todd Lipcon <todd@cloudera.com> wrote:
> >
> > > On Tue, Jul 28, 2009 at 11:24 AM, Devajyoti Sarkar <dsarkar@q-kk.com>
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > In the hadoop documentation it says that all key-value classes need
> to
> > > > implement Writable to allow serialization and de-serialization of
> > outputs
> > > > between mappers and reducers. Is this also necessary for key/value
> > pairs
> > > > sent between the RecordReader and the Mapper (as well as the Reducer
> > and
> > > > the
> > > > RecordWriter)? I assume that each of these two cases, classes are
> > > > instantiated in the same VM. So is it safe to assume that key/value
> > pairs
> > > > are sent by reference instead of serialization/deserialization? If
> so,
> > my
> > > > specific application may get a performance boost. Please do let me
> know
> > > if
> > > > this so.
> > > >
> > >
> > > Yes, this is correct. The values that come out of RecordReaders and go
> > into
> > > RecordWriters do not need to implement Writable.
> > >
> > > -Todd
> > >
> >
>
>
>
> --
> Pro Hadoop, a book to guide you from beginner to hadoop mastery,
> http://www.amazon.com/dp/1430219424?tag=jewlerymall
> www.prohadoopbook.com a community for Hadoop Professionals
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message