Return-Path: Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: (qmail 38656 invoked from network); 18 Oct 2010 08:17:19 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 18 Oct 2010 08:17:19 -0000 Received: (qmail 30983 invoked by uid 500); 18 Oct 2010 08:17:17 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 30604 invoked by uid 500); 18 Oct 2010 08:17:13 -0000 Mailing-List: contact common-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-user@hadoop.apache.org Delivered-To: mailing list common-user@hadoop.apache.org Received: (qmail 30594 invoked by uid 99); 18 Oct 2010 08:17:12 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 18 Oct 2010 08:17:12 +0000 X-ASF-Spam-Status: No, hits=4.4 required=10.0 tests=FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of tmatthewjohn1988@gmail.com designates 209.85.216.48 as permitted sender) Received: from [209.85.216.48] (HELO mail-qw0-f48.google.com) (209.85.216.48) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 18 Oct 2010 08:17:04 +0000 Received: by qwg8 with SMTP id 8so346869qwg.35 for ; Mon, 18 Oct 2010 01:16:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:date:message-id :subject:from:to:content-type; bh=rp/Hgxl6T2iciPnryrki2PI9/2xXh99DzInBpZjnFqU=; b=mnHqNEn4G1UyDZds7uehCFQaGDPiSGgOWia11sYV8R8P2ghuivAa9LXptf3ZWReaNH GjxkE54uCuAqKGpFZZfcwFDR4ssz/50crMB7NyIDeMdOywarKSR4WpHPXRxToBLtBYCO GUKqAeXvLyzd3VvZ8eadNj4EChyRW7lAAhQHM= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:date:message-id:subject:from:to:content-type; b=nkPmmc7RRSuTjvNZmu+7gLPV/XNO+l12jhRdPShBTZ3g0y7+Z7gSUwOiRdlRVm3fxd x6ljk6qyPah7+Z7ELg6GKIZPJeTZtbK4zj/3dCkp+VNawO0GgCRRKid7yt2AJ48L+QXz gR3VSvtqRm/OGNBbw6eU87KtNA164mscV4VPw= MIME-Version: 1.0 Received: by 10.229.109.197 with SMTP id k5mr77760qcp.14.1287389803650; Mon, 18 Oct 2010 01:16:43 -0700 (PDT) Received: by 10.229.27.85 with HTTP; Mon, 18 Oct 2010 01:16:43 -0700 (PDT) Date: Mon, 18 Oct 2010 13:46:43 +0530 Message-ID: Subject: Reduce side join From: Matthew John To: common-user@hadoop.apache.org Content-Type: multipart/alternative; boundary=001636310761ef1d280492dfcb8e X-Virus-Checked: Checked by ClamAV on apache.org --001636310761ef1d280492dfcb8e Content-Type: text/plain; charset=ISO-8859-1 Hi all, I am working on a join operation using Hadoop. I came across Reduce-side join in Hadoop The Definitive Guide. As far as I understand , this technique is all about : 1) Read the two inputs using separate mappers and tag the two inputs using different values such that in the Sort Shuffle phase the primary key Record (with only one instance of a Record with the key) comes before the records with the same foreign key. 2) In the Reduce phase , read the required portion of the 1st record to a variable and keep on appending it to the rest of the records to follow . My doubt is : Is it fine if I have more than 1 set of input records (primary record followed by the foreign records) in the same reduce phase. For example, will this technique work if I have just one reducer running. Regards, Matthew John --001636310761ef1d280492dfcb8e--