Return-Path: X-Original-To: apmail-hadoop-common-user-archive@www.apache.org Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 73C9B10326 for ; Sun, 12 Jan 2014 16:26:48 +0000 (UTC) Received: (qmail 37593 invoked by uid 500); 12 Jan 2014 16:26:31 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 37160 invoked by uid 500); 12 Jan 2014 16:26:29 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 37153 invoked by uid 99); 12 Jan 2014 16:26:28 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 12 Jan 2014 16:26:28 +0000 X-ASF-Spam-Status: No, hits=-0.1 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of amits@infolinks.com designates 207.126.144.149 as permitted sender) Received: from [207.126.144.149] (HELO eu1sys200aog120.obsmtp.com) (207.126.144.149) by apache.org (qpsmtpd/0.29) with SMTP; Sun, 12 Jan 2014 16:26:22 +0000 Received: from mail-ig0-f175.google.com ([209.85.213.175]) (using TLSv1) by eu1sys200aob120.postini.com ([207.126.147.11]) with SMTP ID DSNKUtLCF4JSDSf2DfZuYDkVKRv7X7CeAYiQ@postini.com; Sun, 12 Jan 2014 16:26:01 UTC Received: by mail-ig0-f175.google.com with SMTP id uq10so800478igb.2 for ; Sun, 12 Jan 2014 08:25:59 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:date:message-id:subject:from:to :content-type; bh=TJ6dFHzgTWmK9fziFdozjERossrc9JFNnsWYOuhsXmU=; b=E6ULPqW2Fsl/AEdhAepZnCIvXfyx6IMLeZ0e2CioNm0Bds1PeaDosbL5r87ibwd5P6 MEQgaFR9dh11bttJx8NOad/Z0lew5FcPyXGnnlZqC2xHtrd7E/TJi3/C6ikBNQ8GP2bt MqY2aTvcZr2HpF38DQIbpW9AydalMQ6BhfFw+HA4qfOf0rGnQrhsN6Dqgtlyw4D9MZ3z d+E6Jo61g/CIof+VSjqJfMz0ujCxOHWorgE1BZ4S7HNtE780KSYhdgVGoJaWNU4+pbaB u21YUkTZtJJVCpfjT9WJNuu6wnpmsIJGafj8eFLrDXeHVH0QZOgNfEiPhlYc44NxhJUl nVGQ== X-Received: by 10.50.154.161 with SMTP id vp1mr14305102igb.17.1389543959046; Sun, 12 Jan 2014 08:25:59 -0800 (PST) X-Gm-Message-State: ALoCoQkGUc5MzGvtxLLvbHcZ29n60W+q3i7iz2aNx7mUDOKg8qOcoiVEjTA+0OFGo4p2puwEoP3LdCyxFuIeCWmqNjchDO3cLdAc6/o/O9DN+3DAj/bytdHgpDSwmlHpx6D2OuVRKp0b/1YMApENWoPthhqCqS5Ff7/Y0y5pHdyQJWrlYeNBrwg= MIME-Version: 1.0 X-Received: by 10.50.154.161 with SMTP id vp1mr14305083igb.17.1389543958716; Sun, 12 Jan 2014 08:25:58 -0800 (PST) Received: by 10.64.227.15 with HTTP; Sun, 12 Jan 2014 08:25:58 -0800 (PST) Date: Sun, 12 Jan 2014 18:25:58 +0200 Message-ID: Subject: manipulating key in combine phase From: Amit Sela To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=14dae93403390f3dda04efc86ad1 X-Virus-Checked: Checked by ClamAV on apache.org --14dae93403390f3dda04efc86ad1 Content-Type: text/plain; charset=ISO-8859-1 Hi all, I was wondering if it is possible to manipulate the key during combine: Say I have a mapreduce job where the key has many qualifiers. I would like to "split" the key into two (or more) keys if it has more than, say 100 qualifiers. In the combiner class I would do something like: int count = 0; for (Writable value: values) { if (++count >= 100){ context.write(newKey, value); } else { context.write(key, value); } } where newKey is something like key+randomUUID I know that the combiner can be called "zero, once or more..." and I'm getting strange results (same key written more then once) so I would be glad to get some deeper insight into how the combiner works. Thanks, Amit. --14dae93403390f3dda04efc86ad1 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Hi all,=A0

I was wondering if it is pos= sible to manipulate the key during combine:

Say I = have a mapreduce job where the key has many qualifiers.=A0
I woul= d like to "split" the key into two (or more) keys if it has more = than, say 100 qualifiers.
In the combiner class I would do something like:

<= div>int count =3D 0;
for (Writable value: values) {
=A0= if (++count >=3D 100){
=A0 =A0 context.write(newKey, value);<= /div>
=A0 } else {
=A0 =A0 context.write(key, value);
=A0 }
}

where newKey is something like key+rand= omUUID

I know that the combiner can be called &quo= t;zero, once or more..." and I'm getting strange results (same key= written more then once) so I would be glad to get some deeper insight into= how the combiner works.

Thanks,

Amit.
--14dae93403390f3dda04efc86ad1--