Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 7F8A5416F for ; Fri, 17 Jun 2011 20:31:23 +0000 (UTC) Received: (qmail 50997 invoked by uid 500); 17 Jun 2011 20:31:22 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 50963 invoked by uid 500); 17 Jun 2011 20:31:22 -0000 Mailing-List: contact mapreduce-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-user@hadoop.apache.org Delivered-To: mailing list mapreduce-user@hadoop.apache.org Received: (qmail 50955 invoked by uid 99); 17 Jun 2011 20:31:22 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 17 Jun 2011 20:31:22 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,RFC_ABUSE_POST,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of geoffry.roberts@gmail.com designates 209.85.160.176 as permitted sender) Received: from [209.85.160.176] (HELO mail-gy0-f176.google.com) (209.85.160.176) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 17 Jun 2011 20:31:14 +0000 Received: by gyb11 with SMTP id 11so94917gyb.35 for ; Fri, 17 Jun 2011 13:30:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:date:message-id:subject:from:to :content-type; bh=RhnnwQ8pMHGSwoTqH7r4o+oPDTn67YXalUZJLe+Dvvc=; b=vOCRKJc5amzx+tZfbAyypI0g3A/ATtWku2lmgyq6PJkKRH7JgKS9BP3RJW6FDIE2Fq JPmcgPA/eKWTk+Tp/BietKbqH4HG+rnDlm2Gx4XXFbOas++zPmZtiHWm7xLnbkOu2uqi Lc6Sl8xgEqrHeJeY+3Jcj0qOiFZwH+ZQnx6Dk= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:date:message-id:subject:from:to:content-type; b=eg3nGaAOc6siZbN8KqTxuuSr3/LBJJXnBUzi4ii5BuFSXwSwH5qluBLI08YqGnDJow 1QnlcOiXOvizN1XIgD853nalcUM6Gmz8sRnou5q5L7qc2fbpY8CSCJgA9WBWz7uimzJC SCHM8LFRGPddScAtygLdVrV6HgAqDWQ27k0I0= MIME-Version: 1.0 Received: by 10.90.62.22 with SMTP id k22mr3004698aga.95.1308342653290; Fri, 17 Jun 2011 13:30:53 -0700 (PDT) Received: by 10.90.87.20 with HTTP; Fri, 17 Jun 2011 13:30:53 -0700 (PDT) Date: Fri, 17 Jun 2011 13:30:53 -0700 Message-ID: Subject: Mystery, A Tale of Two Reducers From: Geoffry Roberts To: mapreduce-user@hadoop.apache.org Content-Type: multipart/alternative; boundary=0016362839e618099a04a5ee43df X-Virus-Checked: Checked by ClamAV on apache.org --0016362839e618099a04a5ee43df Content-Type: text/plain; charset=ISO-8859-1 All, I have come across a situation that I don't understand. *First Reducer: *Behold the first of two reducers. A fragment of it's output follows. Simple no? It doesn't do anything. I've highlighted two records from the output. Keep them in mind. Now lets look at the second reducer. * *protected void reduce(Text key, Iterable visitors, Context ctx) throws IOException, InterruptedException { for (Text visitor : visitors) { ctx.write(key, visitor); } } 2005-09-16=33614 42340108 *more==>* 2005-09-16=33614 42340106 *more==>* *2005-09-16=33614 42340113 more==>* 2005-09-16=44135 42324490 *more==>* 2005-09-16=44135 42339700 *more==>* ... *2005-09-16=44135 42324489 more==>* *Second Reducer:* This is a variation on the reducer from above. A fragment of it's output follows. The difference is I add all visitors to a list then I iterate through the list to produce my output. Remember the two highlighted records from above? They are now showing up in the output as duplicates and the other records appear to be missing. Why? I have never seen an ArrayList behave like this. It must have something to do with hadoop. I have a reasons for using the list. One such reason is that I must have a full count of all visitors before I can do my output, but I spare you. To my mind, this second reducer should output the same as the first. protected void reduce(Text key, Iterable visitors, Context ctx) throws IOException, InterruptedException { List list = new ArrayList(); for (Text visitor : visitors) { list.add(visitor); } for (Text visitor : list) { ctx.write(key, visitor); } } 2005-09-16=33614 42340113 *more==>* 2005-09-16=33614 42340113 *more==>* 2005-09-16=33614 42340113 *more==>* 2005-09-16=44135 42324489 *more==>* 2005-09-16=44135 42324489 *more==>* Thanks in advance -- Geoffry Roberts --0016362839e618099a04a5ee43df Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable All,

I have come across a situati= on that I don't understand.

First Reducer:

Behold the firs= t of two reducers.=A0 A fragment of it's output follows.=A0 Simple no?= =A0 It doesn't do anything.=A0 I've highlighted two records from th= e output.=A0 Keep them in mind.=A0 Now lets look at the second reducer.

<= /span>
protected void reduce(Text key, Iterable<Text> visitors= , Context ctx)

=A0throws IOException, Inte= rruptedException {
=A0=A0=A0 for (Text visitor : visitors) {
=A0=A0 =A0=A0=A0 ctx.write(key, visitor);
=A0=A0=A0 }
=A0}

2005-09-16= =3D33614=A0=A0=A0 42340108=A0=A0=A0 more=3D=3D>
2005-09-16=3D3= 3614=A0=A0=A0 42340106=A0=A0=A0 more=3D=3D>
2005-09-16=3D33614=A0=A0=A0 42340113=A0=A0=A0 more=3D=3D><= br>2005-09-16=3D44135=A0=A0=A0 42324490=A0=A0=A0 more=3D=3D>
2= 005-09-16=3D44135=A0=A0=A0 42339700=A0=A0=A0 more=3D=3D>
...2005-09-16=3D44135=A0=A0=A0 42324489=A0=A0=A0 more=3D=3D>


Second Reducer:

This is = a variation on the reducer from above.=A0 A fragment of it's output follows.=A0 The difference is = I add all visitors to a list then I iterate through the list to produce my = output.=A0 Remember the two highlighted records from above? They are now sh= owing up in the output as duplicates and the other records appear to be mis= sing.=A0 Why?=A0 I have never seen an ArrayList behave like this.=A0 It mus= t have something to do with hadoop.

I have a reasons for using the list.=A0 One such reason is that I must = have a full count of all visitors before I can do my output, but I spare yo= u.

To my mind, this second reducer should output the same as the fir= st.=A0

pr= otected void reduce(Text key, Iterable<Text> visitors, Context ctx)
throws IOException, InterruptedException {<= /span>
=A0=A0=A0 List<Text&= gt; list =3D new ArrayList<Text>();
= =A0=A0=A0 for (Text visitor : visitors) {
=A0=A0=A0 =A0=A0=A0 lis= t.add(visitor);
=A0=A0=A0 }

=A0=A0=A0 for (Text vis= itor : list) {
=A0=A0=A0 =A0=A0=A0 ctx.wri= te(key, visitor);

=A0=A0=A0 }
}


2005-09-16=3D33614=A0=A0=A0 42= 340113=A0=A0=A0 more=3D=3D>
2005-09-16=3D33614=A0=A0=A0 42340113=A0=A0=A0 more=3D=3D>
2005= -09-16=3D33614=A0=A0=A0 42340113=A0=A0=A0 more=3D=3D>
2005-09-= 16=3D44135=A0=A0=A0 42324489=A0=A0=A0 more=3D=3D>
2005-09-16= =3D44135=A0=A0=A0 42324489=A0=A0=A0 more=3D=3D>

Thanks in = advance

--
Geoffry Roberts

--0016362839e618099a04a5ee43df--