Return-Path: Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: (qmail 12725 invoked from network); 5 Apr 2010 21:11:04 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 5 Apr 2010 21:11:04 -0000 Received: (qmail 61818 invoked by uid 500); 5 Apr 2010 21:11:02 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 61773 invoked by uid 500); 5 Apr 2010 21:11:02 -0000 Mailing-List: contact common-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-user@hadoop.apache.org Delivered-To: mailing list common-user@hadoop.apache.org Received: (qmail 61746 invoked by uid 99); 5 Apr 2010 21:11:02 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 05 Apr 2010 21:11:02 +0000 X-ASF-Spam-Status: No, hits=3.9 required=10.0 tests=AWL,FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of machaca74@gmail.com designates 209.85.160.48 as permitted sender) Received: from [209.85.160.48] (HELO mail-pw0-f48.google.com) (209.85.160.48) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 05 Apr 2010 21:10:55 +0000 Received: by pwi7 with SMTP id 7so3306873pwi.35 for ; Mon, 05 Apr 2010 14:10:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:date:received:message-id :subject:from:to:content-type; bh=0QltR6vNJ8U/4EZe/YrHKAAmUVUajIY6bNuBL3TADQc=; b=FDn2GgRfBXIrk9C6L42s2vL/Fsmyvaa0wb4swFoEKxnWs++z8vWUSIvq3b7mlLYK8t 11VbLmbknnW70Zu5bNnv1F3fswAH0Rw2jAmXSbC/XXIrrMypCXGRk31J6oUxi2ySvs4l 0LEnZtNV8DvA7SX2v+SyxT3U1ADC1ChwvW1Do= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:date:message-id:subject:from:to:content-type; b=hkXH9wO4VhpGhXNfXlYwnrQlSK4ugmTiYhHJlC6OpxaXuwGibyzfkUZLSXfEtrmk5h fMkLdJ/dg+54STV6vDLIFnmvhiCThe8DpY783SbVzltE/9sg6kxBUUPHBgXU31YIisII ZqJe+wv0KcnjKbeNYDted/TXilCiDG/mR0/N8= MIME-Version: 1.0 Received: by 10.115.33.15 with HTTP; Mon, 5 Apr 2010 14:10:35 -0700 (PDT) Date: Mon, 5 Apr 2010 14:10:35 -0700 Received: by 10.114.187.40 with SMTP id k40mr5172295waf.30.1270501835452; Mon, 05 Apr 2010 14:10:35 -0700 (PDT) Message-ID: Subject: Reducer-side join example From: M B To: common-user@hadoop.apache.org Content-Type: multipart/alternative; boundary=0016e64cd67496b187048383c22a --0016e64cd67496b187048383c22a Content-Type: text/plain; charset=ISO-8859-1 Hi, I need a good java example to get me started with some joining we need to do, any examples would be appreciated. File A: Field1 Field2 A 12 B 13 C 22 A 24 File B: Field1 Field2 Field3 A Car ... B Truck ... B SUV ... B Van ... So, we need to first join File A and B on Field1 (say both are string fields). The result would just be: A 12 Car ... A 24 Car ... B 13 Truck ... B 13 SUV ... B 13 Van ... and so on - with all the fields from both files returning. Once we have that, we sometimes need to then transform it so we have a single record per key (Field1): A (12,Car) (24,Car) B (13,Truck) (13,SUV) (13,Van) --however it looks, basically tuples for each key (we'll modify this later to return a conatenated set of fields from B, etc) At other times, instead of transforming to a single row, we just need to modify rows based on values. So if B.Field2 equals "Van", we need to set Output.Field2 = whatever then output to file ... Are there any good examples of this in native java (we can't use pig/hive/etc)? thanks. --0016e64cd67496b187048383c22a--