hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joey Echeverria <j...@cloudera.com>
Subject Re: How does Hadoop reuse the objects?
Date Wed, 03 Aug 2011 11:19:53 GMT
Hadoop reuses objects as an optimization. If you need to keep a copy
in memory, you need to call clone yourself. I've never used Avro, but
my guess is that the BARs are not reused, only the FOO.


On Wed, Aug 3, 2011 at 3:18 AM, Vyacheslav Zholudev
<vyacheslav.zholudev@gmail.com> wrote:
> Hi all,
> I'm using Avro as a serialization format and assume I have a generated specific class
FOO that I use as a Mapper output format:
> class FOO {
>  int a;
>  List<BAR> barList;
> }
> where BAR is another generated specific Java class.
> When I iterate over "Iterable<FOO> values" in the Reducer it is clear that the
same object of class FOO is reused, i.e.
> FOO foo1 = values.iterator.next();
> FOO foo2 = values.iterator.next();
> assertThat(foo1 == foo2, is (true));
> So I have the following questions:
> 1) Is the list barList reused over the next() calls?
> 2) If yes, can the objects that are in the barList be reused? For example, if the first
time next() is called, the list contains two BAR objects, the next time next() is called the
barList contains 3 objects and 2 of them are equal by reference to the two from the list of
the first next() call. In other words, does Hadoop maintain some sort of "object pool"?
> 3) Why do not AvroTools  generate clone() methods since it would be quite straightforward
and more importantly useful given that objects are reused?
> Thanks a lot in advance!
> Vyacheslav

Joseph Echeverria
Cloudera, Inc.

View raw message