hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <tdunn...@veoh.com>
Subject Re: Reduce Output
Date Tue, 15 Apr 2008 15:36:19 GMT

Just count the items in your reducer.


On 4/15/08 6:18 AM, "Natarajan, Senthil" <senthil@pitt.edu> wrote:

> Thanks Ted that worked.
> 
> I have one more question.
> 
> Now I have the Reduce output is something like this.
> 
> K1      v1 v1 v1
> K2      v2 v3 v3 v2 v2
> 
> I would like to have it in this way
> 
> K1      v1(3)
> K2      v2(3) v3(2)
> 
> Example:
> 
> 8.14.0.2_12904   371  371 371
> 1.7.0.1_50098    468 468 468 468 371  371 468 512 512 512
> 
> 8.14.0.2_12904   371(3)
> 1.7.0.1_50098    371(2) 468(5) 512(3)
> 
> 
> Is there any easy way to do this in Hadoop other than conventional way of
> creating script which will sequentially parse each line and Iterate.
> 
> Thanks,
> Senthil
> 
> -----Original Message-----
> From: Ted Dunning [mailto:tdunning@veoh.com]
> Sent: Monday, April 14, 2008 2:20 PM
> To: core-user@hadoop.apache.org
> Subject: Re: Reduce Output
> 
> 
> 
> Try using Text, Text as the output type and use something like a
> StringBuffer or Formatter to construct a tab-separated list.
> 
> 
> On 4/14/08 11:13 AM, "Natarajan, Senthil" <senthil@pitt.edu> wrote:
> 
>> Could you please let me know or point out how to store the output of reduce
>> in
>> this format
>> K1      v1 v2
>> K2    v1 v2 v3 v4
>> K3      v1
>> K4      v1 v2
>> 
>> Right now I am getting this format
>> K1      v1v2
>> K2    v1v2v3v4
>> K3      v1
>> K4      v1v2
>> 
>> Here is the Reduce class, what needs to be changed here?
>> 
>> public static class Reduce extends MapReduceBase implements Reducer<Text,
>> IntWritable, Text, IntWritable> {
>>       public void reduce(Text key, Iterator<IntWritable> values,
>> OutputCollector<Text, IntWritable> output, Reporter reporter) throws
>> IOException {
>>          int sum = 0;
>>         while (values.hasNext()) {
>>           sum += values.next().get() ;
>>         }
>>         output.collect(key, new IntWritable(sum));
>>       }
>> 
>>     }
>> 
>> 
>> 
>> -----Original Message-----
>> From: Ted Dunning [mailto:tdunning@veoh.com]
>> Sent: Monday, April 14, 2008 1:49 PM
>> To: core-user@hadoop.apache.org
>> Subject: Re: Reduce Output
>> 
>> 
>> The format of the reduce output is the responsibility of the reducer.  You
>> can store the output any way you like.
>> 
>> 
>> On 4/14/08 10:17 AM, "Natarajan, Senthil" <senthil@pitt.edu> wrote:
>> 
>>> Thanks Ted.
>>> 
>>> Actually I was trying to do the third option by myself before posting this
>>> question.
>>> Problem is I couldn't get the Reduce output like this
>>> 
>>> 1.0.2.92        206 475
>>> 1.0.2.9 316 475 847
>>> 
>>> If the values separated by space or something so that I can use sequential
>>> script to iterate.
>>> 
>>> But the problem is the values are like this in the reduce output
>>> 1.0.2.92        206475
>>> 1.0.2.9 316475847
>>> 
>>> So do you know any class or method that I can use to have the values
>>> separated
>>> by space or any other separator.
>>> 
>>> Thanks,
>>> Senthil
>>> 
>>> -----Original Message-----
>>> From: Ted Dunning [mailto:tdunning@veoh.com]
>>> Sent: Monday, April 14, 2008 12:47 PM
>>> To: core-user@hadoop.apache.org
>>> Subject: Re: Reduce Output
>>> 
>>> 
>>> Write an additional map-reduce step to join the data items together by
>>> treating different input files differently.
>>> 
>>> OR
>>> 
>>> Write an additional map-reduce step that reads in your string values in the
>>> map configuration method and keeps them in memory for looking up as you pass
>>> over the output of your previous reduce step.  You won't need a reducer for
>>> this approach, but your conversion table will have to fit into memory.
>>> 
>>> OR
>>> 
>>> Write a sequential script to read your string values and iterate over the
>>> reduce output using conventional methods.  This works very well if you can
>>> process your data in less time than hadoop takes to start your job.
>>> 
>>> 
>>> 
>>> 
>>> On 4/14/08 9:42 AM, "Natarajan, Senthil" <senthil@pitt.edu> wrote:
>>> 
>>>> Hi,
>>>> 
>>>> I have the reduce output like this.
>>>> 
>>>> 1.0.2.92                206475
>>>> 
>>>> 1.0.2.9                   316475847
>>>> 
>>>> 1.0.3.93                3846495
>>>> 
>>>> 1.0.4.93                316975
>>>> 
>>>> 
>>>> 
>>>> But I want to display like this...
>>>> 
>>>> 1.0.2.92                206 475
>>>> 
>>>> 1.0.2.9                   316 475 847
>>>> 
>>>> 1.0.3.93                384 6495
>>>> 
>>>> 1.0.4.93                316 975
>>>> 
>>>> 
>>>> 
>>>> And each value has description associated with it something like this
>>>> 
>>>> 
>>>> 
>>>> 206         ->            TextDesp206
>>>> 
>>>> 475         ->            TextDesp475
>>>> 
>>>> 316         ->            TextDesp316
>>>> 
>>>> 847         ->            TextDesp847
>>>> 
>>>> 
>>>> 
>>>> So eventually I would like to see my output look like this
>>>> 
>>>> 
>>>> 
>>>> 1.0.2.92                TextDesp206 -> TextDesp475
>>>> 1.0.2.9                   TextDesp316 -> TextDesp475 -> TextDesp847
>>>> 
>>>> How to do this, I tried different ways, but no luck.
>>>> 
>>>> public static class Reduce extends MapReduceBase implements Reducer<Text,
>>>> IntWritable, Text, IntWritable> {
>>>> 
>>>>       public void reduce(Text key, Iterator<IntWritable> values,
>>>> OutputCollector<Text, IntWritable> output, Reporter reporter) throws
>>>> IOException {
>>>> 
>>>>          Text word = new Text();
>>>> 
>>>>         String sum = "";
>>>> 
>>>>         while (values.hasNext()) {
>>>> 
>>>>            sum += values.next().get() + " ";
>>>> 
>>>>         }
>>>> 
>>>>         //output.collect(key, new IntWritable(Integer.parseInt(sum)));
>>>> 
>>>>         word.set(sum);
>>>> 
>>>>         output.collect(word, new
>>>> IntWritable(Integer.parseInt(key.toString())));
>>>> 
>>>>       }
>>>> 
>>>> 
>>>> 
>>>>     }
>>>> 
>>>> 
>>>> 
>>>> Is there any way to use Reducer and OutputCollector or any other classes
to
>>>> output like this
>>>> 
>>>> 
>>>> 
>>>> 1.0.2.92                TextDesp206 -> TextDesp475
>>>> 
>>>> 1.0.2.9                   TextDesp316 -> TextDesp475 -> TextDesp847
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> Thanks,
>>>> Senthil
>>> 
>> 
> 


Mime
View raw message