Return-Path: X-Original-To: apmail-hive-user-archive@www.apache.org Delivered-To: apmail-hive-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 9A89217DDC for ; Thu, 29 Jan 2015 14:13:59 +0000 (UTC) Received: (qmail 50230 invoked by uid 500); 29 Jan 2015 14:13:57 -0000 Delivered-To: apmail-hive-user-archive@hive.apache.org Received: (qmail 50171 invoked by uid 500); 29 Jan 2015 14:13:57 -0000 Mailing-List: contact user-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hive.apache.org Delivered-To: mailing list user@hive.apache.org Received: (qmail 50161 invoked by uid 99); 29 Jan 2015 14:13:57 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 29 Jan 2015 14:13:57 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of matshyeq@gmail.com designates 209.85.223.180 as permitted sender) Received: from [209.85.223.180] (HELO mail-ie0-f180.google.com) (209.85.223.180) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 29 Jan 2015 14:13:53 +0000 Received: by mail-ie0-f180.google.com with SMTP id rl12so33982716iec.11 for ; Thu, 29 Jan 2015 06:13:32 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=FWhXAXIFjvdwVYRV9B/Gr/Vw4Hu3h6eDHKCGae5kHoY=; b=YM3xe5uvjmQaLgeiHg9auzhfd1j/Gj1AVYmDKTrzynP6k0KFIR67ubXXh3b8AHcojZ AY/lzq0X+JzBmjZS+KID4psfT2rTNqokrxqkyABcNLCYgPThVeILL3S0LhngxcY7bill svQgmrZd6huTDKI/JeAOmyNjmFjHQJ1h2kIqX3wP4QQBKUBrcCRnqOnveafyAYeHA0y8 PwKqQFgO+KJ1emKUijkunXevt19ayzkzX6ROs0lMzNnwBRNzobww6yhPBXr1JvBmYlss lXbBqU9Dr0etJQgDvbqDAgHpqx31+sRTh0aMiHp3ufJsd1/SjFR4eLdAH4kZX0ItmyO2 3svA== X-Received: by 10.42.94.79 with SMTP id a15mr791378icn.30.1422540812240; Thu, 29 Jan 2015 06:13:32 -0800 (PST) MIME-Version: 1.0 Received: by 10.64.227.83 with HTTP; Thu, 29 Jan 2015 06:13:12 -0800 (PST) In-Reply-To: <2B3C195A-7622-4309-879C-3079869E0346@icloud.com> References: <2B3C195A-7622-4309-879C-3079869E0346@icloud.com> From: matshyeq Date: Thu, 29 Jan 2015 14:13:12 +0000 Message-ID: Subject: Re: Partitioned table and Bucket Map Join To: user Content-Type: multipart/alternative; boundary=485b397dd1c1cb40c6050dcb1705 X-Virus-Checked: Checked by ClamAV on apache.org --485b397dd1c1cb40c6050dcb1705 Content-Type: text/plain; charset=UTF-8 My hunch is while partitioning is in fact very similar to bucketing (actually superior as you have some control over what file data goes to) the hive optimizer only applies bucket joins if your tables are bucketed so your join condition t1.bucketed_column = t2.bucketed_column triggers the bucketed map join but t1.partitioned_column = t2.partitioned_column doesn't. I'm hoping someone with deeper Hive knowledge would be able to confirm this. Thank you, Kind Regards ~Maciek On Thu, Jan 29, 2015 at 1:51 PM, murali parimi < muralikrishna.parimi@icloud.com> wrote: > I faced the same situation where two tables with 3 billion records on each > side and partitioned, sorted on same key. Set the following parameters in > the hive query assuming the join will happen in the map phase. > > set > hive.input.format=org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat; > set hive.optimize.bucketmapjoin=true; > set hive.optimize.bucketmapjoin.sortedmerge=true; > set hive.enforce.sorting=true; > > I am using hive version 13 and the storage format is Orc. One of the table > is small in size but I haven't checked whether irfan fit in the cache as we > have huge memory. But the map sided join didn't happen. What could be the > reason? > > Sent from my iPhone > > > On Jan 29, 2015, at 7:38 AM, matshyeq wrote: > > > > I do have two tables partitioned on the same criteria. > > Could I still take advantage of Bucket Map Join or better, Sort Merge > Bucket Map Join? > > How? > > > > ~Maciek > --485b397dd1c1cb40c6050dcb1705 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
My hunch is while partitioning is in fact very simila= r to bucketing (actually superior as you have some control over what file d= ata goes to) the hive optimizer only applies bucket joins if your tables ar= e bucketed so your join condition
=C2=A0 =C2=A0t1.bucketed_column= =3D t2.bucketed_column
triggers the bucketed map join
= but
=C2=A0 =C2=A0t1.partitioned_column =3D t2.partitioned_column<= /div>
doesn't.
I'm hoping someone with deeper Hive kn= owledge would be able to confirm this.

Thank you,
Kind Regards=C2=A0
~Maciek

On Thu, Jan 29, 2015 at 1:51 PM, murali pari= mi <muralikrishna.parimi@icloud.com> wrote:
I faced the same situation where two tables= with 3 billion records on each side and partitioned, sorted on same key. S= et the following parameters in the hive query assuming the join will happen= in the map phase.

set hive.input.format=3Dorg.apache.hadoop.hive.ql.io.BucketizedHiveInputFor= mat;
set hive.optimize.bucketmapjoin=3Dtrue;
set hive.optimize.bucketmapjoin.sortedmerge=3Dtrue;
set hive.enforce.sorting=3Dtrue;

I am using hive version 13 and the storage format is Orc. One of the table = is small in size but I haven't checked whether irfan fit in the cache a= s we have huge memory. But the map sided join didn't happen. What could= be the reason?

Sent from my iPhone

> On Jan 29, 2015, at 7:38 AM, matshyeq <matshyeq@gmail.com> wrote:
>
> I do have two tables partitioned on the same criteria.
> Could I still take advantage of Bucket Map Join or better, Sort Merge = Bucket Map Join?
> How?
>
> ~Maciek

--485b397dd1c1cb40c6050dcb1705--