hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Owen O'Malley" <omal...@apache.org>
Subject Re: How to modify hadoop-wordcount example to display File-wise results.
Date Mon, 06 Oct 2008 06:08:20 GMT
On Sun, Oct 5, 2008 at 12:46 PM, Ted Dunning <ted.dunning@gmail.com> wrote:

> What you need to do is snag access to the filename in the configure method
> of the mapper.

You can also do it in the map method with:

((FileSplit) reporter.getInputSplit()).getPath()

Then instead of outputting just the word as the key, output a pair
> containing the word and the file name as the key.  Everything downstream
> should remain the same.

If you want to have each file handled by a single reduce, I'd suggest:

class FileWordPair implements Writable {
  private Text fileName;
  private Text word;
  public int hashCode() {
     return fileName.hashCode();

so that the HashPartitioner will send the records for file Foo to a single
reducer. It would make sense to use this as an example for when to use
grouping comparators (for getting a single call to reduce for each file)

-- Owen

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message