hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James Hammerton <james.hammer...@mendeley.com>
Subject Re: Strange behaviour from a custom Writable
Date Mon, 08 Feb 2010 22:58:35 GMT
Thanks, Ed. I'm copying the values into a list and then sorting them and
then emiting the top 20, so yes they are buffered. I'll try cloning each
item tomorrow and see if that works.

Does this mean the Iterator is returning the same pointer with each call to
next() but with different contents being stored at that location each time?
E.g. it returns a pointer to a buffer that gets filled with different
contents each time you call the iterator?

Regards,

James

On Mon, Feb 8, 2010 at 7:09 PM, Ed Mazur <mazur@cs.umass.edu> wrote:

> Hi James,
>
> I ran into something similar in the past and suspect the problem may
> be in your reduce function. Are you buffering values from the
> iterator? If you are, then you need to first clone the value when
> taking it from the iterator (implement Cloneable in your custom
> Writable). Otherwise they will all be references to the last item from
> the iterator.
>
> Ed
>
> On Mon, Feb 8, 2010 at 12:23 PM, James Hammerton
> <james.hammerton@mendeley.com> wrote:
> > Hi,
> >
> > For a particular project I created a writable for holding a long and a
> > double called LongDoublePair. My mapper outputs LongDoublePair values and
> > the reducer receives an Iterable<LongDoublePair>.
> >
> > The problem is that when I try to use it, whilst I get the right number
> of
> > elements in the Iterable, they are all copies of the same object! I
> tested
> > that this was the case by using the following code in the loop that
> > processes the pairs:
> >
> >             if (prev != null) {
> >                 if (prev == next) {
> >                     context.getCounter("MY COUNTERS", key.toString() +
> > "Values are same object").increment(1);
> >                 }
> >             } else {
> >                 prev = next;
> >             }
> >
> > The counters appeared with all sorts of values, e.g. I got lots of lines
> > like: "10/02/08 16:57:18 INFO mapred.JobClient:     990Values are same
> > object=46", indicating that the iterator was iterating through copies of
> the
> > same object.
> >
> > My code works if instead of using the LongDoublePair I use a Text object
> and
> > simply concatenate the two number strings with a space to separate them
> and
> > have the reducer parse the string into a LongDoublePair and process it.
> >
> > Via unit tests, I've ensured the LongDoublePair's serialisation and
> > deserialisation code works, that hashCode and equals do what they should
> do,
> > etc, but I can't seem to get this to work other than by falling back on
> > using Text objects. Any ideas what might be going wrong?
> >
> > I've attached the source code for LongDoublePair to this email in case
> you
> > can spot anything that might be behind the problem.
> >
> > James
> >
> > --
> > James Hammerton | Senior Data Mining Engineer
> > www.mendeley.com/profiles/james-hammerton
> >
> > Mendeley Limited | London, UK | www.mendeley.com
> > Registered in England and Wales | Company Number 6419015
> >
> >
> >
> >
>



-- 
James Hammerton | Senior Data Mining Engineer
www.mendeley.com/profiles/james-hammerton

Mendeley Limited | London, UK | www.mendeley.com
Registered in England and Wales | Company Number 6419015

Mime
View raw message