Return-Path: X-Original-To: apmail-hive-user-archive@www.apache.org Delivered-To: apmail-hive-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B6E74848B for ; Fri, 12 Aug 2011 02:25:49 +0000 (UTC) Received: (qmail 93111 invoked by uid 500); 12 Aug 2011 02:25:48 -0000 Delivered-To: apmail-hive-user-archive@hive.apache.org Received: (qmail 93034 invoked by uid 500); 12 Aug 2011 02:25:48 -0000 Mailing-List: contact user-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hive.apache.org Delivered-To: mailing list user@hive.apache.org Received: (qmail 93026 invoked by uid 99); 12 Aug 2011 02:25:48 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 12 Aug 2011 02:25:48 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [98.138.91.58] (HELO nm17-vm0.bullet.mail.ne1.yahoo.com) (98.138.91.58) by apache.org (qpsmtpd/0.29) with SMTP; Fri, 12 Aug 2011 02:25:39 +0000 Received: from [98.138.90.56] by nm17.bullet.mail.ne1.yahoo.com with NNFMP; 12 Aug 2011 02:25:18 -0000 Received: from [98.138.89.161] by tm9.bullet.mail.ne1.yahoo.com with NNFMP; 12 Aug 2011 02:25:18 -0000 Received: from [127.0.0.1] by omp1017.mail.ne1.yahoo.com with NNFMP; 12 Aug 2011 02:25:18 -0000 X-Yahoo-Newman-Property: ymail-3 X-Yahoo-Newman-Id: 323043.40672.bm@omp1017.mail.ne1.yahoo.com Received: (qmail 72586 invoked by uid 60001); 12 Aug 2011 02:25:17 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1313115917; bh=MCVaKnFIXD8VTYglLBRedGgLIPvIfm0s+BUMJYIu24s=; h=X-YMail-OSG:Received:X-Mailer:References:Message-ID:Date:From:Reply-To:Subject:To:In-Reply-To:MIME-Version:Content-Type; b=JZYrqkyL94Y/effgO7SArD59s31pkJDXM9doe8zEeGhz56AdYGejArgvHaEvezVPkDSB838ZrJwkwac2j6ZS+ItSxWDLFg1LvL+TEbmO/iSXjPuZhOGoQTD9ibUGq9JnuFn9ZFDc39VccwPXGDjnBXlDVQDVwqYpoS4QKXscMU0= DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=X-YMail-OSG:Received:X-Mailer:References:Message-ID:Date:From:Reply-To:Subject:To:In-Reply-To:MIME-Version:Content-Type; b=REiMMCcpldfnfpMNrSvdpGFFhvLUJYoEfXUA0X/9Aw5Hb4YRIGkispZTI/LLKSfJPYTXd2Dx6CjEFr389Ctz8twI+pDGYcPcgrlAoHHS/snDbz1MAKc0AVwjWrbV0auL5KK9tylWthJWKC65s5t2xPeKSqbJyk5QU330zpaWCPA=; X-YMail-OSG: 2o.w9hAVM1nfRnqwShN2O0CB3oKrzQHz2MT38Kan1C.5PYj bLU19Z2spzPkxiV9Dr4mMRc0zoLwHCjcVsaNBBpFRosEG069JURuZjEvaFy5 mddIIZvUQdDTi.FELIlEtg3vhai9bZQMkX3pIIJKieYFVMvnQr4CblS1HkG8 A1mkeJ2XdCOL5zw1hyPmTiqaAcq.RUFkL1fd90ruXOahk63EFlFPiFeycd7P GDXMMa1Hw_Rv4gOqOG69AKtf0E5lL1XP4K40aay7oaW_.huBS5Or7d7hAKN4 fjxYtW1RgwnpO_x_gGxuJ26PfnGsPKlf_UDiDoK.g1EW2qBYjMF2zEerXQpd S3O_jwQnKfyyNNlee4OEbIviRYKhKLaIq0Woq_5qL5zJhB8xE5163I460XX5 2.TccvxyfMYimVNwsQi6b9YTIJ1BFdXZCUAJ8DHWbsSpqYPLDT5cHk9Y9EP4 92ar7SY19PcLUK0gl1I1niuyMccOmXz96EAfHTGnds20HsInlD.ySdA9y7qv bPuoKh71L4JDzaWE1wXAtju_Duw79Geql2swUFyuxKdBZWTyyMLHZFncqcvg sTN_N2g01wkfv9QND7hGCMFosIBWGYrT8qR3XqonLIgrOLVtrVcMgXuIb6z8 Pzt_RiMEjQLUO Received: from [173.164.150.189] by web110503.mail.gq1.yahoo.com via HTTP; Thu, 11 Aug 2011 19:25:17 PDT X-Mailer: YahooMailWebService/0.8.113.313619 References: <361d713f.f14f.131bbb95b6f.Coremail.hadoop_wu@163.com> Message-ID: <1313115917.52679.YahooMailNeo@web110503.mail.gq1.yahoo.com> Date: Thu, 11 Aug 2011 19:25:17 -0700 (PDT) From: Ayon Sinha Reply-To: Ayon Sinha Subject: Re: multiple tables join with only one hug table. To: "user@hive.apache.org" In-Reply-To: <361d713f.f14f.131bbb95b6f.Coremail.hadoop_wu@163.com> MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="0-1098535769-1313115917=:52679" --0-1098535769-1313115917=:52679 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable The Mapjoin hint syntax help optimize by loading the smaller tables specifi= ed in the Mapjoin hint into memory. Then every small table is in memory of = each mapper.=0A=A0=0A-Ayon=0ASee My Photos on Flickr=0AAlso check out my Bl= og for answers to commonly asked questions.=0A=0A=0A=0A____________________= ____________=0AFrom: "Daniel,Wu" =0ATo: hive =0ASent: Thursday, August 11, 2011 7:01 PM=0ASubject: multiple t= ables join with only one hug table.=0A=0A=0Aif the retailer fact table is s= ale_fact with 10B rows, and join with 3 small tables: stores (10K), product= s(10K), period (1K). What's the best join solution?=0A=0AIn oracle, it can = first build hash for stores, and hash for products, and hash for stores. Th= en probe using the fact table, if the row matched in stores, that row can g= o up further to map with products by hashing check, if pass, then go up fur= ther to try to match period. In this way, the sale_fact only needs to be sc= anned once which save lots of disk IO.=A0 Is this doable in hive, if doable= , what hint need to use? --0-1098535769-1313115917=:52679 Content-Type: text/html; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable
The Mapjoin hint synt= ax help optimize by loading the smaller tables specified in the Mapjoin hin= t into memory. Then every small table is in memory of each mapper.
 

From: "Daniel,Wu" <ha= doop_wu@163.com>
To:= hive <user@hive.apache.org>
Sent= : Thursday, August 11, 2011 7:01 PM
Subject: multiple tables join with only one hug tab= le.

if the retailer= fact table is sale_fact with 10B rows, and join with 3 small tables: store= s (10K), products(10K), period (1K). What's the best join solution?

= In oracle, it can first build hash for stores, and hash for products, and h= ash for stores. Then probe using the fact table, if the row matched in stor= es, that row can go up further to map with products by hashing check, if pa= ss, then go up further to try to match period. In this way, the sale_fact o= nly needs to be scanned once which save lots of disk IO.  Is this doab= le in hive, if doable, what hint need to use?




--0-1098535769-1313115917=:52679--