hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <tdunn...@veoh.com>
Subject Re: Reduce Output
Date Mon, 14 Apr 2008 17:48:30 GMT

The format of the reduce output is the responsibility of the reducer.  You
can store the output any way you like.


On 4/14/08 10:17 AM, "Natarajan, Senthil" <senthil@pitt.edu> wrote:

> Thanks Ted.
> 
> Actually I was trying to do the third option by myself before posting this
> question.
> Problem is I couldn't get the Reduce output like this
> 
> 1.0.2.92        206 475
> 1.0.2.9 316 475 847
> 
> If the values separated by space or something so that I can use sequential
> script to iterate.
> 
> But the problem is the values are like this in the reduce output
> 1.0.2.92        206475
> 1.0.2.9 316475847
> 
> So do you know any class or method that I can use to have the values separated
> by space or any other separator.
> 
> Thanks,
> Senthil
> 
> -----Original Message-----
> From: Ted Dunning [mailto:tdunning@veoh.com]
> Sent: Monday, April 14, 2008 12:47 PM
> To: core-user@hadoop.apache.org
> Subject: Re: Reduce Output
> 
> 
> Write an additional map-reduce step to join the data items together by
> treating different input files differently.
> 
> OR
> 
> Write an additional map-reduce step that reads in your string values in the
> map configuration method and keeps them in memory for looking up as you pass
> over the output of your previous reduce step.  You won't need a reducer for
> this approach, but your conversion table will have to fit into memory.
> 
> OR
> 
> Write a sequential script to read your string values and iterate over the
> reduce output using conventional methods.  This works very well if you can
> process your data in less time than hadoop takes to start your job.
> 
> 
> 
> 
> On 4/14/08 9:42 AM, "Natarajan, Senthil" <senthil@pitt.edu> wrote:
> 
>> Hi,
>> 
>> I have the reduce output like this.
>> 
>> 1.0.2.92                206475
>> 
>> 1.0.2.9                   316475847
>> 
>> 1.0.3.93                3846495
>> 
>> 1.0.4.93                316975
>> 
>> 
>> 
>> But I want to display like this...
>> 
>> 1.0.2.92                206 475
>> 
>> 1.0.2.9                   316 475 847
>> 
>> 1.0.3.93                384 6495
>> 
>> 1.0.4.93                316 975
>> 
>> 
>> 
>> And each value has description associated with it something like this
>> 
>> 
>> 
>> 206         ->            TextDesp206
>> 
>> 475         ->            TextDesp475
>> 
>> 316         ->            TextDesp316
>> 
>> 847         ->            TextDesp847
>> 
>> 
>> 
>> So eventually I would like to see my output look like this
>> 
>> 
>> 
>> 1.0.2.92                TextDesp206 -> TextDesp475
>> 1.0.2.9                   TextDesp316 -> TextDesp475 -> TextDesp847
>> 
>> How to do this, I tried different ways, but no luck.
>> 
>> public static class Reduce extends MapReduceBase implements Reducer<Text,
>> IntWritable, Text, IntWritable> {
>> 
>>       public void reduce(Text key, Iterator<IntWritable> values,
>> OutputCollector<Text, IntWritable> output, Reporter reporter) throws
>> IOException {
>> 
>>          Text word = new Text();
>> 
>>         String sum = "";
>> 
>>         while (values.hasNext()) {
>> 
>>            sum += values.next().get() + " ";
>> 
>>         }
>> 
>>         //output.collect(key, new IntWritable(Integer.parseInt(sum)));
>> 
>>         word.set(sum);
>> 
>>         output.collect(word, new
>> IntWritable(Integer.parseInt(key.toString())));
>> 
>>       }
>> 
>> 
>> 
>>     }
>> 
>> 
>> 
>> Is there any way to use Reducer and OutputCollector or any other classes to
>> output like this
>> 
>> 
>> 
>> 1.0.2.92                TextDesp206 -> TextDesp475
>> 
>> 1.0.2.9                   TextDesp316 -> TextDesp475 -> TextDesp847
>> 
>> 
>> 
>> 
>> 
>> Thanks,
>> Senthil
> 


Mime
View raw message