Return-Path: Delivered-To: apmail-hadoop-core-user-archive@www.apache.org Received: (qmail 32253 invoked from network); 15 Apr 2008 15:37:48 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 15 Apr 2008 15:37:48 -0000 Received: (qmail 4170 invoked by uid 500); 15 Apr 2008 15:37:45 -0000 Delivered-To: apmail-hadoop-core-user-archive@hadoop.apache.org Received: (qmail 4141 invoked by uid 500); 15 Apr 2008 15:37:45 -0000 Mailing-List: contact core-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-user@hadoop.apache.org Delivered-To: mailing list core-user@hadoop.apache.org Received: (qmail 4132 invoked by uid 99); 15 Apr 2008 15:37:45 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 15 Apr 2008 08:37:45 -0700 X-ASF-Spam-Status: No, hits=2.8 required=10.0 tests=RCVD_IN_DNSWL_LOW,RCVD_NUMERIC_HELO,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: 69.50.2.13 is neither permitted nor denied by domain of tdunning@veoh.com) Received: from [69.50.2.13] (HELO ex9.myhostedexchange.com) (69.50.2.13) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 15 Apr 2008 15:37:02 +0000 Received: from 75.80.179.210 ([75.80.179.210]) by ex9.hostedexchange.local ([69.50.2.13]) with Microsoft Exchange Server HTTP-DAV ; Tue, 15 Apr 2008 15:37:10 +0000 User-Agent: Microsoft-Entourage/11.3.3.061214 Date: Tue, 15 Apr 2008 08:36:19 -0700 Subject: Re: Reduce Output From: Ted Dunning To: Message-ID: Thread-Topic: Reduce Output Thread-Index: AcieTnYiEzr9zXWpSQSykgvHILW/cQAAKcXEAADJF0AAAV+ZNAAAquuAAABsxhsAJs148AAFx6Ff In-Reply-To: Mime-version: 1.0 Content-type: text/plain; charset="US-ASCII" Content-transfer-encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org Just count the items in your reducer. On 4/15/08 6:18 AM, "Natarajan, Senthil" wrote: > Thanks Ted that worked. > > I have one more question. > > Now I have the Reduce output is something like this. > > K1 v1 v1 v1 > K2 v2 v3 v3 v2 v2 > > I would like to have it in this way > > K1 v1(3) > K2 v2(3) v3(2) > > Example: > > 8.14.0.2_12904 371 371 371 > 1.7.0.1_50098 468 468 468 468 371 371 468 512 512 512 > > 8.14.0.2_12904 371(3) > 1.7.0.1_50098 371(2) 468(5) 512(3) > > > Is there any easy way to do this in Hadoop other than conventional way of > creating script which will sequentially parse each line and Iterate. > > Thanks, > Senthil > > -----Original Message----- > From: Ted Dunning [mailto:tdunning@veoh.com] > Sent: Monday, April 14, 2008 2:20 PM > To: core-user@hadoop.apache.org > Subject: Re: Reduce Output > > > > Try using Text, Text as the output type and use something like a > StringBuffer or Formatter to construct a tab-separated list. > > > On 4/14/08 11:13 AM, "Natarajan, Senthil" wrote: > >> Could you please let me know or point out how to store the output of reduce >> in >> this format >> K1 v1 v2 >> K2 v1 v2 v3 v4 >> K3 v1 >> K4 v1 v2 >> >> Right now I am getting this format >> K1 v1v2 >> K2 v1v2v3v4 >> K3 v1 >> K4 v1v2 >> >> Here is the Reduce class, what needs to be changed here? >> >> public static class Reduce extends MapReduceBase implements Reducer> IntWritable, Text, IntWritable> { >> public void reduce(Text key, Iterator values, >> OutputCollector output, Reporter reporter) throws >> IOException { >> int sum = 0; >> while (values.hasNext()) { >> sum += values.next().get() ; >> } >> output.collect(key, new IntWritable(sum)); >> } >> >> } >> >> >> >> -----Original Message----- >> From: Ted Dunning [mailto:tdunning@veoh.com] >> Sent: Monday, April 14, 2008 1:49 PM >> To: core-user@hadoop.apache.org >> Subject: Re: Reduce Output >> >> >> The format of the reduce output is the responsibility of the reducer. You >> can store the output any way you like. >> >> >> On 4/14/08 10:17 AM, "Natarajan, Senthil" wrote: >> >>> Thanks Ted. >>> >>> Actually I was trying to do the third option by myself before posting this >>> question. >>> Problem is I couldn't get the Reduce output like this >>> >>> 1.0.2.92 206 475 >>> 1.0.2.9 316 475 847 >>> >>> If the values separated by space or something so that I can use sequential >>> script to iterate. >>> >>> But the problem is the values are like this in the reduce output >>> 1.0.2.92 206475 >>> 1.0.2.9 316475847 >>> >>> So do you know any class or method that I can use to have the values >>> separated >>> by space or any other separator. >>> >>> Thanks, >>> Senthil >>> >>> -----Original Message----- >>> From: Ted Dunning [mailto:tdunning@veoh.com] >>> Sent: Monday, April 14, 2008 12:47 PM >>> To: core-user@hadoop.apache.org >>> Subject: Re: Reduce Output >>> >>> >>> Write an additional map-reduce step to join the data items together by >>> treating different input files differently. >>> >>> OR >>> >>> Write an additional map-reduce step that reads in your string values in the >>> map configuration method and keeps them in memory for looking up as you pass >>> over the output of your previous reduce step. You won't need a reducer for >>> this approach, but your conversion table will have to fit into memory. >>> >>> OR >>> >>> Write a sequential script to read your string values and iterate over the >>> reduce output using conventional methods. This works very well if you can >>> process your data in less time than hadoop takes to start your job. >>> >>> >>> >>> >>> On 4/14/08 9:42 AM, "Natarajan, Senthil" wrote: >>> >>>> Hi, >>>> >>>> I have the reduce output like this. >>>> >>>> 1.0.2.92 206475 >>>> >>>> 1.0.2.9 316475847 >>>> >>>> 1.0.3.93 3846495 >>>> >>>> 1.0.4.93 316975 >>>> >>>> >>>> >>>> But I want to display like this... >>>> >>>> 1.0.2.92 206 475 >>>> >>>> 1.0.2.9 316 475 847 >>>> >>>> 1.0.3.93 384 6495 >>>> >>>> 1.0.4.93 316 975 >>>> >>>> >>>> >>>> And each value has description associated with it something like this >>>> >>>> >>>> >>>> 206 -> TextDesp206 >>>> >>>> 475 -> TextDesp475 >>>> >>>> 316 -> TextDesp316 >>>> >>>> 847 -> TextDesp847 >>>> >>>> >>>> >>>> So eventually I would like to see my output look like this >>>> >>>> >>>> >>>> 1.0.2.92 TextDesp206 -> TextDesp475 >>>> 1.0.2.9 TextDesp316 -> TextDesp475 -> TextDesp847 >>>> >>>> How to do this, I tried different ways, but no luck. >>>> >>>> public static class Reduce extends MapReduceBase implements Reducer>>> IntWritable, Text, IntWritable> { >>>> >>>> public void reduce(Text key, Iterator values, >>>> OutputCollector output, Reporter reporter) throws >>>> IOException { >>>> >>>> Text word = new Text(); >>>> >>>> String sum = ""; >>>> >>>> while (values.hasNext()) { >>>> >>>> sum += values.next().get() + " "; >>>> >>>> } >>>> >>>> //output.collect(key, new IntWritable(Integer.parseInt(sum))); >>>> >>>> word.set(sum); >>>> >>>> output.collect(word, new >>>> IntWritable(Integer.parseInt(key.toString()))); >>>> >>>> } >>>> >>>> >>>> >>>> } >>>> >>>> >>>> >>>> Is there any way to use Reducer and OutputCollector or any other classes to >>>> output like this >>>> >>>> >>>> >>>> 1.0.2.92 TextDesp206 -> TextDesp475 >>>> >>>> 1.0.2.9 TextDesp316 -> TextDesp475 -> TextDesp847 >>>> >>>> >>>> >>>> >>>> >>>> Thanks, >>>> Senthil >>> >> >