hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pedro Costa <psdc1...@gmail.com>
Subject Re: Map output in the map side is 10 bytes bigger than on the reduce?
Date Tue, 10 Aug 2010 08:23:42 GMT
I would like to add to MR my personal feature to assure that the map
outputs is transferred correctly to the reduce. Beside of simply
looking to the CRC code of the mapoutput, I want to guarantee the
content of the map output hasn't be tampered. I'm assuring the
correctness of the file by hashing the map output on the map side.
When the reduce task fetch the map output, it will do another hash on
the file and it will compare the 2 hashes. As result, the 2 hashes
must be equal, but for now, they aren't because the reducer fetch a 10
bytes smaller map output.

I hope that my explanation was clear.

Now it's still missing the answers to my previous questions. :)


On Tue, Aug 10, 2010 at 1:07 AM, Allen Wittenauer
<awittenauer@linkedin.com> wrote:
> On Aug 9, 2010, at 1:27 PM, Pedro Costa wrote:
>> 2 - If I'm deducting correctly, the reduce will always fetch 10 bytes
>> less than the saved map output?
> Why do you care?


View raw message