hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James Hammerton <james.hammer...@mendeley.com>
Subject Behaviour of reducer's Iterable in MR unit.
Date Thu, 09 Dec 2010 10:21:00 GMT

This relates to a bug we had a while back.

When running a reducer, if you want to buffer the values, you normally need
to take a copy of each value as you iterate through them. This is because
the iterator always returns the same object but the contents of the object
get filled with each value as the iterator steps through.

However *this behaviour is not reproduced by the reducer drivers in MR unit*.
Even if you give the reduce driver a List (why do we have to give a List
when reducer specifies merely an Iterable?) designed to behave this way, MR
unit copies the values into a normal List before presenting them to the
reducer. At least this is the case with the 0.20.1 install we have.

Anyway, in order to test our bug fix we extended the ReduceDriver class to
actually copy the values into an iterable that does reproduce the behaviour
so that we can test for bugs caused by failing to copy the values. In more
recent versions of Hadoop (we use 0.20.1) is the behaviour of the reduce
drivers altered to match that of actual running reducers in this respect?
Are there any plans to do this? Alternatively, I'd be willing to fix this in
the Hadoop codebase myself if necessary.



James Hammerton | Senior Data Mining Engineer

Mendeley Limited | London, UK | www.mendeley.com
Registered in England and Wales | Company Number 6419015

View raw message