hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kai Voigt...@123.org>
Subject Re: Where is the hadoop-examples source code for the Sort example mapper/reducer?
Date Sat, 13 Aug 2011 15:22:10 GMT

the Identity Mapper and Reducer do what the name implies, they pretty much return their input
as their output.

TeraSort relies on the sorting that is built in Hadoop's Sort&Shuffle phase.

So, the map() method in TeraSort looks like this:

map(offset, line) -> (line, _)

offset is the key to map() and represents the byte offset of the line (which is the value).
map() returns the line as the key and some value which is not needed.

reduce() looks like this:

reduce(line, values) -> (line)

Again, the input is returned as is. The sort&shuffle layer between map() and reduce()
guarantees that keys (lines) will come in sorted order. That's why the overall output will
be the sorted input.

This all is easy when there's just one reducer. Question to make sure you understood things
so far: What's the issue with more than one reducer?


Am 13.08.2011 um 17:10 schrieb Sean Hogan:

> Thanks for the link, but it hasn't helped answer my original question - that
> Sort.java seems to use IdentityMapper and IdentityReducer. Perhaps it is the
> Sort.java that is used when executing the below command, but I can't figure
> out what it actually uses for the mapper and reducer. It's entirely possible
> I'm just missing something obvious.
> I'm interested in seeing how the map and reduce fits into sorting with the
> following command:
> $ hadoop jar hadoop-*-examples.jar sort input output
> I'd appreciate it if someone could explain what mappers/reducers are used in
> that above command (link to the implementation of whatever sort they use and
> how it fits into MapReduce)
> Thanks.
> -Sean

Kai Voigt

View raw message