Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B8834D383 for ; Wed, 26 Sep 2012 13:42:45 +0000 (UTC) Received: (qmail 84910 invoked by uid 500); 26 Sep 2012 13:42:41 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 84842 invoked by uid 500); 26 Sep 2012 13:42:41 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 84835 invoked by uid 99); 26 Sep 2012 13:42:40 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 26 Sep 2012 13:42:40 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [87.230.46.220] (HELO vwp3725.webpack.hosteurope.de) (87.230.46.220) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 26 Sep 2012 13:42:34 +0000 Received: from dslb-188-096-164-240.pools.arcor-ip.net ([188.96.164.240] helo=[192.168.2.107]); authenticated by vwp3725.webpack.hosteurope.de running ExIM with esmtpsa (TLS1.0:RSA_AES_128_CBC_SHA1:16) id 1TGrsO-0006V7-Rm; Wed, 26 Sep 2012 15:42:09 +0200 Content-Type: text/plain; charset=iso-8859-1 Mime-Version: 1.0 (Mac OS X Mail 6.1 \(1498\)) Subject: Re: Programming Question / Joining Dataset From: Kai Voigt In-Reply-To: <5063059E.7040707@swe-blog.net> Date: Wed, 26 Sep 2012 15:42:06 +0200 Cc: bharath vissapragada Content-Transfer-Encoding: quoted-printable Message-Id: <1CFEFEE5-3927-44C5-A08F-CBE8FDEC23E3@123.org> References: <506300DE.1030606@swe-blog.net> <5063059E.7040707@swe-blog.net> To: user@hadoop.apache.org X-Mailer: Apple Mail (2.1498) X-bounce-key: webpack.hosteurope.de;k@123.org;1348666954;6eef6e93; The design pattern for this is called "Reduce-side Join". Enter it into = Google and you will get a lot of details. Kai Am 26.09.2012 um 15:39 schrieb "Oliver B. Fischer" = : > Yes I know Hive and also Pig. Both are suitable for my problems but = before starting with one of them I simply would like to know how to do = it with pure MR. ;-) >=20 > Bye, >=20 > Oliver >=20 > On 09/26/2012 03:36 PM, bharath vissapragada wrote: >> Have you seen Hive[1] ? It can join DataSets over mapreduce . Also = you >> can provide your custom SerDes, to read your file format (to avoid >> pre-processing) and also create your own data types, (For eg: Map of >> Maps,Arrays etc) >>=20 >> [1] https://cwiki.apache.org/Hive/home.html >>=20 >> On Wed, Sep 26, 2012 at 6:49 PM, Oliver B. Fischer >> > wrote: >>=20 >> Hi all, >>=20 >> I have to join to large datasets A and B. I preprocess both = datasets >> by parsing the source text files and creating custom datatypes ADT >> and BDT out ouf it. >>=20 >> Now I have to join theses data. Both databsets A' and B' already >> have the same datatype as key. But how can I pass both custom >> datatypes ADT and BDT to the same reducer instance for joining? >>=20 >> Bye, >>=20 >> Oliver >>=20 >>=20 >>=20 >>=20 >> -- >> Regards, >> Bharath .V >> w:http://researchweb.iiit.ac.in/~bharath.v >> >=20 --=20 Kai Voigt k@123.org