hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vyacheslav Zholudev <vyacheslav.zholu...@gmail.com>
Subject Re: How does Hadoop reuse the objects?
Date Thu, 04 Aug 2011 21:07:14 GMT
Just sharing my today's discovery:
Hadoop also reuses objects in internal lists, in my example the BAR objects.
That is if the first FOO object has two BAR objects in the list, then the
second FOO object will contain the same (equal by reference) first two BAR
objects in the list. So in case of Avro it would be good if auto-generated
code implemented a 'clone' method.
Btw, is it good to clone avro-specific objects by serializing/deserializing
using SpecificDatum{Writer|Reader}?

Vyacheslav


On 4 August 2011 21:35, <Milind.Bhandarkar@emc.com> wrote:

> HADOOP-2399 has caused a lot of problems for users so far, and the saga
> still continues :-(
>
> I remember spending 18 straight hours in 2008 with a user debugging this
> issue.
>
> - milind
>
> ---
> Milind Bhandarkar
> Greenplum Labs, EMC
> (Disclaimer: Opinions expressed in this email are those of the author, and
> do
> not necessarily represent the views of any organization, past or present,
> the author might be affiliated with.)
>
>
>
>
> On 8/3/11 4:19 AM, "Joey Echeverria" <joey@cloudera.com> wrote:
>
> >Hadoop reuses objects as an optimization. If you need to keep a copy
> >in memory, you need to call clone yourself. I've never used Avro, but
> >my guess is that the BARs are not reused, only the FOO.
> >
> >-Joey
> >
> >On Wed, Aug 3, 2011 at 3:18 AM, Vyacheslav Zholudev
> ><vyacheslav.zholudev@gmail.com> wrote:
> >> Hi all,
> >>
> >> I'm using Avro as a serialization format and assume I have a generated
> >>specific class FOO that I use as a Mapper output format:
> >>
> >> class FOO {
> >>  int a;
> >>  List<BAR> barList;
> >> }
> >>
> >> where BAR is another generated specific Java class.
> >>
> >> When I iterate over "Iterable<FOO> values" in the Reducer it is clear
> >>that the same object of class FOO is reused, i.e.
> >> FOO foo1 = values.iterator.next();
> >> FOO foo2 = values.iterator.next();
> >> assertThat(foo1 == foo2, is (true));
> >>
> >> So I have the following questions:
> >> 1) Is the list barList reused over the next() calls?
> >> 2) If yes, can the objects that are in the barList be reused? For
> >>example, if the first time next() is called, the list contains two BAR
> >>objects, the next time next() is called the barList contains 3 objects
> >>and 2 of them are equal by reference to the two from the list of the
> >>first next() call. In other words, does Hadoop maintain some sort of
> >>"object pool"?
> >> 3) Why do not AvroTools  generate clone() methods since it would be
> >>quite straightforward and more importantly useful given that objects are
> >>reused?
> >>
> >> Thanks a lot in advance!
> >>
> >> Vyacheslav
> >>
> >>
> >>
> >>
> >
> >
> >
> >--
> >Joseph Echeverria
> >Cloudera, Inc.
> >443.305.9434
> >
>
>


-- 
Best,
Vyacheslav Zholudev

Mime
View raw message