Return-Path: X-Original-To: apmail-hadoop-common-user-archive@www.apache.org Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 815D1F1F0 for ; Fri, 5 Apr 2013 20:30:48 +0000 (UTC) Received: (qmail 74221 invoked by uid 500); 5 Apr 2013 20:30:43 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 74095 invoked by uid 500); 5 Apr 2013 20:30:43 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 74088 invoked by uid 99); 5 Apr 2013 20:30:43 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 05 Apr 2013 20:30:43 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of jamalshasha@gmail.com designates 209.85.220.52 as permitted sender) Received: from [209.85.220.52] (HELO mail-pa0-f52.google.com) (209.85.220.52) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 05 Apr 2013 20:30:38 +0000 Received: by mail-pa0-f52.google.com with SMTP id fb10so2232746pad.39 for ; Fri, 05 Apr 2013 13:30:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:date:message-id:subject:from:to :content-type; bh=jQdXNeW7E1/PjWpn42WNQWNPBWRokpl41kzj5habsms=; b=zo5FCtsJ8KUGaVu780Fr8oiywWqjIDHopmVZwdWV3cPIrXFa+5oN68E0KXDVeiFu/D kQCw3KO2NUZcBb4ffIap/rL32o0ukHDKA1cHoH98NVg+cei75RDTMH+EV1FMlSFKHxG4 KQ9G6pEj0Ml1aU9OXydUPAz/w59oS8hPVccLUY/uXd/Ew5Go30kHmZlaI8kmAQD4/0xb 0O6kRONxx7E6l4bgbYDl33IcS7SCndgR5erbsmw9B3/AOoqEYr7905a864r6FdbVhd8d L2v87JPOuJgVtIVo3ful74wa9Bd0/AM9CsYP9361zit/bu6qc1QMM6VQnlOBycxo9KkF 80MA== MIME-Version: 1.0 X-Received: by 10.68.11.169 with SMTP id r9mr16287176pbb.221.1365193818071; Fri, 05 Apr 2013 13:30:18 -0700 (PDT) Received: by 10.70.64.168 with HTTP; Fri, 5 Apr 2013 13:30:17 -0700 (PDT) Date: Fri, 5 Apr 2013 13:30:17 -0700 Message-ID: Subject: Difference between combiner and aggregator From: jamal sasha To: "user@hadoop.apache.org" Content-Type: multipart/alternative; boundary=bcaec5314b2d936a9b04d9a2f4b6 X-Virus-Checked: Checked by ClamAV on apache.org --bcaec5314b2d936a9b04d9a2f4b6 Content-Type: text/plain; charset=ISO-8859-1 Hi, I am trying to understand the difference between combiner and aggregator. Based on my readings: Wordcount example (mapper) aggregator class Mapper method MAP H <-- Associative array for all term t in document: H{t} = H{t} + 1 for all term t ele H do EMIT(term t, count H{t}) combiner: class Mapper method INITIALIZE H <-- Associative array method MAP for all term t in document: H{t} = H{t} + 1 method CLOSE for all term t ele H do EMIT(term t, count H{t}) So, second method is how combiner is implemented. But 1 seems much simpler? What are the gains I get using combiner instead of local aggregations? --bcaec5314b2d936a9b04d9a2f4b6 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Hi,
=A0I am trying to understand the difference = between combiner and aggregator.

Based= on my readings:
Wordcount example (mapper)

aggregator
class Mapper
=A0 method MAP
=A0 H <-- Associative array
=A0 for all term t in document:
=A0 =A0 =A0 H{t} =3D H= {t} + 1
=A0 for all term t ele H do
=A0 =A0 =A0 EMIT(ter= m t, count H{t})


= combiner:
class Mapper
=A0method INITI= ALIZE
=A0 H <-- Associative array
=A0 method MAP
= =A0 for all term t in document:
=A0 =A0 =A0 H{t} =3D H{t} + 1
=A0method CLOSE
=A0 for all term t ele H do
=A0 =A0 =A0 EMIT(term t, count H{t})

So, second method is how combiner is implemented.=
But 1 seems much simpler?
What are the gai= ns I get using combiner instead of local aggregations?

--bcaec5314b2d936a9b04d9a2f4b6--