hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ajay Srivastava <Ajay.Srivast...@guavus.com>
Subject Re: How to modify hadoop-wordcount example to display File-wise results.
Date Fri, 30 Mar 2012 00:57:14 GMT
Hi Aaron,
I guess that it can be done by using counters.
You can define a counter for each node in your cluster and then, in map method increment a
node specific counter either by checking hostname or ip address.
It's not a very good solution as you will need to modify your code whenever a node is added/removed
from cluster and there will be as many if conditions in code as number of nodes. You can try
this out if you do not find a cleaner solution. I wish that this counter should have been
part of predefined counters. 


Regards,
Ajay Srivastava


On 30-Mar-2012, at 12:49 AM, aaron_v wrote:

> 
> Hi people, Am new to Nabble and Hadoop. I was having a look at the wordcount
> program. Can someone please let me know how to find which data gets mapped
> to which node?In the sense, I have a master node 0 and 4 other nodes 1-4 
> and I ran the wordcount successfully. But I would like to print for each
> node how much data it got from the input data file. Any suggestions??
> 
> us latha wrote:
>> 
>> Hi,
>> 
>> Inside Map method, performed following change for  Example: WordCount
>> v1.0<http://hadoop.apache.org/core/docs/current/mapred_tutorial.html#Example%3A+WordCount+v1.0>at
>> http://hadoop.apache.org/core/docs/current/mapred_tutorial.html
>> ------------------
>> String filename = new String();
>> ...
>> filename =  ((FileSplit) reporter.getInputSplit()).getPath().toString();
>> while (tokenizer.hasMoreTokens()) {
>>            word.set(tokenizer.nextToken()+" "+filename);
>> --------------------
>> 
>> Worked great!! Thanks to everyone!
>> 
>> Regards,
>> Srilatha
>> 
>> 
>> On Sat, Oct 18, 2008 at 6:24 PM, Latha <uslatha@gmail.com> wrote:
>> 
>>> Hi All,
>>> 
>>> Thankyou for your valuable inputs in suggesting me the possible solutions
>>> of creating an index file with following format.
>>> word1 filename count
>>> word2 filename count.
>>> 
>>> However, following is not working for me. Please help me to resolve the
>>> same.
>>> 
>>> --------------------------
>>> public static class Map extends MapReduceBase implements
>>> Mapper<LongWritable, Text, Text, Text> {
>>>          private Text word = new Text();
>>>          private Text filename = new Text();
>>>          public void map(LongWritable key, Text value,
>>> OutputCollector<Text, Text > output, Reporter reporter) throws
>>> IOException {
>>>          filename.set( ((FileSplit)
>>> reporter.getInputSplit()).getPath().toString());
>>>          String line = value.toString();
>>>          StringTokenizer tokenizer = new StringTokenizer(line);
>>>          while (tokenizer.hasMoreTokens()) {
>>>               word.set(tokenizer.nextToken());
>>>               output.collect(word, filename);
>>>              }
>>>          }
>>>  }
>>> 
>>>  public static class Reduce extends MapReduceBase implements
>>> Reducer<Text,
>>> Text , Text, Text> {
>>>      public void reduce(Text key, Iterator<Text> values,
>>> OutputCollector<Text, Text > output, Reporter reporter) throws
>>> IOException {
>>>         int sum = 0;
>>>         Text filename;
>>>         while (values.hasNext()) {
>>>             sum ++;
>>>             filename.set(values.next().toString());
>>>         }
>>>       String file = filename.toString() + " " + ( new
>>> IntWritable(sum)).toString();
>>>       filename=new Text(file);
>>>       output.collect(key, filename);
>>>       }
>>>  }
>>> 
>>> --------------------------
>>> 08/10/18 05:38:25 INFO mapred.JobClient: Task Id :
>>> task_200810170342_0010_m_000000_2, Status : FAILED
>>> java.io.IOException: Type mismatch in value from map: expected
>>> org.apache.hadoop.io.IntWritable, recieved org.apache.hadoop.io.Text
>>>        at
>>> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:427)
>>>        at org.myorg.WordCount$Map.map(WordCount.java:23)
>>>        at org.myorg.WordCount$Map.map(WordCount.java:13)
>>>        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
>>>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:219)
>>>        at
>>> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2122)
>>> 
>>> 
>>> Thanks
>>> Srilatha
>>> 
>>> 
>>> 
>>> On Mon, Oct 6, 2008 at 11:38 AM, Owen O'Malley <omalley@apache.org>
>>> wrote:
>>> 
>>>> On Sun, Oct 5, 2008 at 12:46 PM, Ted Dunning <ted.dunning@gmail.com>
>>>> wrote:
>>>> 
>>>>> What you need to do is snag access to the filename in the configure
>>>> method
>>>>> of the mapper.
>>>> 
>>>> 
>>>> You can also do it in the map method with:
>>>> 
>>>> ((FileSplit) reporter.getInputSplit()).getPath()
>>>> 
>>>> 
>>>> Then instead of outputting just the word as the key, output a pair
>>>>> containing the word and the file name as the key.  Everything
>>>> downstream
>>>>> should remain the same.
>>>> 
>>>> 
>>>> If you want to have each file handled by a single reduce, I'd suggest:
>>>> 
>>>> class FileWordPair implements Writable {
>>>> private Text fileName;
>>>> private Text word;
>>>> ...
>>>> public int hashCode() {
>>>>    return fileName.hashCode();
>>>> }
>>>> }
>>>> 
>>>> so that the HashPartitioner will send the records for file Foo to a
>>>> single
>>>> reducer. It would make sense to use this as an example for when to use
>>>> grouping comparators (for getting a single call to reduce for each file)
>>>> too...
>>>> 
>>>> -- Owen
>>>> 
>>> 
>>> 
>> 
>> 
> 
> -- 
> View this message in context: http://old.nabble.com/How-to-modify-hadoop-wordcount-example-to-display-File-wise-results.-tp19826857p33544888.html
> Sent from the Hadoop core-user mailing list archive at Nabble.com.
> 


Mime
View raw message