hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Latha <usla...@gmail.com>
Subject How to modify hadoop-wordcount example to display File-wise results.
Date Sun, 05 Oct 2008 17:12:42 GMT

Hi, Am trying to modify the WordCount.java mentioned at Example: WordCount
Would like to have output the following way,

FileOne    word1  itsCount
FileOne    word2  itsCount
  ..(and so on)
FileTwo    word1  itsCount
FileTwo    wordx  its Count
FileThree  word1 its Count

Am trying to do following changes to the code of WordCount.java

1)  private Text filename = new Text();  // Added this to Map class .Not
sure if I would have access to filename here.
2)  (line 18)OutputCollector<Text, Text, IntWritable> output  // Changed the
argument in the map() function to have another Text field.
3)  (line 23) output.collect(filename, word , one); // Trying to change the
output format as 'filename word count'

Am not sure what other changes are to be affected to achieve the required
output. filename is not available to the map method.
My requirement is to go through all the data available in hdfs and prepare
an index file with < filename word count>  format.
Could you please throw light on how I can achieve this.


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message