Return-Path: X-Original-To: apmail-hive-user-archive@www.apache.org Delivered-To: apmail-hive-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 4645FC9BE for ; Tue, 22 May 2012 15:08:08 +0000 (UTC) Received: (qmail 26082 invoked by uid 500); 22 May 2012 15:08:07 -0000 Delivered-To: apmail-hive-user-archive@hive.apache.org Received: (qmail 26039 invoked by uid 500); 22 May 2012 15:08:07 -0000 Mailing-List: contact user-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hive.apache.org Delivered-To: mailing list user@hive.apache.org Received: (qmail 26030 invoked by uid 99); 22 May 2012 15:08:07 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 22 May 2012 15:08:07 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,HTML_OBFUSCATE_05_10,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of weidong.ban@gmail.com designates 209.85.214.176 as permitted sender) Received: from [209.85.214.176] (HELO mail-ob0-f176.google.com) (209.85.214.176) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 22 May 2012 15:08:00 +0000 Received: by obbef5 with SMTP id ef5so14142558obb.35 for ; Tue, 22 May 2012 08:07:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=EZDJtcHaNXA4DABg/fLKPre4ofoDDpoU8/MVGqRS4nE=; b=qEXA9AaNGkcTw9NJ9ZSysgZk6ZZ1acJ7qSkIHhQbrO2fT0GHbg4hfWYKnHC6m/j2mY 348jKxA2+p/9ZsXdWFaA9ZUXcgquox9qZIZ/JbhrnNiIec0KSEvkCBww9ZNGjeCQ8jl2 Y4nRiCbMheZ95kUsueaQlB5DGdBdNC8ktKDGl0vIA2I39l1ngNww45PdqF9f/aY6W2gn vHt85xiraTPhzlG6oEOWloOIU4Qlb9UMGLDiefKc3kpEjBa6gYWrCv9wr9JyDaLpqEMp dLZ2r0ApOfKUSo/ZOE3Ku+0Pq8/KNSX5rZmfym2Ip8tz/3CFY3YKOTVISllyS1vm1kXo 3b/A== MIME-Version: 1.0 Received: by 10.182.8.99 with SMTP id q3mr22985348oba.63.1337699259711; Tue, 22 May 2012 08:07:39 -0700 (PDT) Received: by 10.182.36.194 with HTTP; Tue, 22 May 2012 08:07:38 -0700 (PDT) Date: Tue, 22 May 2012 23:07:38 +0800 Message-ID: Subject: Condition for doing a sort merge bucket map join From: Bruce Bian To: user@hive.apache.org Content-Type: multipart/alternative; boundary=f46d0444ed593101d304c0a1615c X-Virus-Checked: Checked by ClamAV on apache.org --f46d0444ed593101d304c0a1615c Content-Type: text/plain; charset=ISO-8859-1 Hi , I've got 7 large tables to join(each ~10G in size) into one table, all with the same* 2 *join keys, I've read some documents on sort merge bucket map join, but failed to fire that. I've bucketed all the 7 tables into 20 buckets and sorted by one of the join key, set hive.optimize.bucketmapjoin = true; set hive.optimize.bucketmapjoin.sortedmerge = true; set hive.input.format=org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat; Set the above parameters while doing the join. What else do I miss? Do I have to bucket on both of the join keys(I'm currently trying this)? And does each bucket file has to be smaller than one HDFS block? Thanks a lot. --f46d0444ed593101d304c0a1615c Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Hi ,
I've got 7 large tables to join(each ~10G in size) into one ta= ble, all with the same=A02 join keys, I&#= 39;ve read some documents on sort merge bucket map join, but failed to fire= that.
I've bucketed all the 7 tables into 20 buckets and sorted =A0by on= e of the join key,
set hive.optimize.bucketmapjoin =3D true;
set hive.optimize.bucketmapjoin.sortedmerge =3D true;
set h= ive.input.format=3Dorg.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;= =A0
Set the above parameters while doing the join.
What else do = I miss? Do I have to bucket on both of the join keys(I'm currently tryi= ng this)? And does each bucket file has to be smaller than one HDFS block?<= /div>
Thanks a lot.
--f46d0444ed593101d304c0a1615c--