Return-Path: X-Original-To: apmail-hive-user-archive@www.apache.org Delivered-To: apmail-hive-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E3AC910C4B for ; Tue, 18 Nov 2014 22:21:51 +0000 (UTC) Received: (qmail 88334 invoked by uid 500); 18 Nov 2014 22:21:50 -0000 Delivered-To: apmail-hive-user-archive@hive.apache.org Received: (qmail 88271 invoked by uid 500); 18 Nov 2014 22:21:50 -0000 Mailing-List: contact user-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hive.apache.org Delivered-To: mailing list user@hive.apache.org Received: (qmail 88261 invoked by uid 99); 18 Nov 2014 22:21:50 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 18 Nov 2014 22:21:50 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of hzhu@maprtech.com designates 74.125.82.46 as permitted sender) Received: from [74.125.82.46] (HELO mail-wg0-f46.google.com) (74.125.82.46) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 18 Nov 2014 22:21:23 +0000 Received: by mail-wg0-f46.google.com with SMTP id x12so1123215wgg.5 for ; Tue, 18 Nov 2014 14:20:37 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:date:message-id:subject:from:to :content-type; bh=U2jXjhLdmsXQkravZHDiiieEgA7uUdAC2GcAKUFrTc0=; b=ChZ3eq76tptXcidP1ruI2bhvwU+xRfIkhfVXves9D9vEcmRXYVqq8MlIDKmJP9aUmF THELP4UANZ/UBe7r3N3Ckzlp6tERPlHscUe7x9pDkQn5trmJV/5NDY7hV3z8SLwe8brV N/R7sl7JbQWNn2KyZ7TBBGPcbdICmK7hg2zH/TMEGKXfUbvS3FUN0wwTeJNTXe6tYFWd dcLPW7OzA0Nbnjq6dZX4Kj+sO7Wg+DHEq4/We8VPiRgzh2jxIU5QfzXpfnD2BxVMeU3k FlGdiw7pDP5YqR8kcBe76bmdcfv62uQp4FFx3ruMWsb34v6rQSTWGDlJesafcRKzt3Cc T9AA== X-Gm-Message-State: ALoCoQkXISya+JgGD1je1PXJ0kEV4le3MKOHltEz5ViVKnTFzEWPTb8NWZlIEX1oF+jau/2lCIHT MIME-Version: 1.0 X-Received: by 10.180.98.233 with SMTP id el9mr7359543wib.3.1416349237329; Tue, 18 Nov 2014 14:20:37 -0800 (PST) Received: by 10.27.93.147 with HTTP; Tue, 18 Nov 2014 14:20:37 -0800 (PST) Date: Tue, 18 Nov 2014 14:20:37 -0800 Message-ID: Subject: What is the difference in explain output between Bucket Map Join and normal Map Join? From: Hao Zhu To: user@hive.apache.org Content-Type: multipart/alternative; boundary=f46d044288962b74e3050829818a X-Virus-Checked: Checked by ClamAV on apache.org --f46d044288962b74e3050829818a Content-Type: text/plain; charset=UTF-8 Hi All, I am testing the different explain plan output and found it hard to find the difference between Bucket Map Join(non sort merged) and normal Map Join. The tested hive version is 0.13. To enable Bucket Map Join, there are 2 conditions: 1. set hive.optimize.bucketmapjoin = true; 2. If the tables being joined are bucketized on the join columns, and the number of buckets in one table is a multiple of the number of buckets in the other table, the buckets can be joined with each other. Here are my tests: *1. Create 2 Bucketed Tables(non-sorted.)* create table b1(col0 string,col1 string,col2 string,col3 string,col4 string,col5 string,col6 string) clustered by (col0) into 32 buckets; create table b2(col0 string,col1 string,col2 string,col3 string,col4 string,col5 string,col6 string) clustered by (col0) into 8 buckets; set hive.enforce.bucketing = true; >From passwords insert OVERWRITE table b1 select * limit 10000; >From passwords insert OVERWRITE table b2 select * limit 10000; *2. set hive.optimize.bucketmapjoin = true;* > hive> explain select b1.* from b1,b2 where b1.col0=b2.col0; > OK > STAGE DEPENDENCIES: > Stage-4 is a root stage > Stage-3 depends on stages: Stage-4 > Stage-0 is a root stage > STAGE PLANS: > Stage: Stage-4 > Map Reduce Local Work > Alias -> Map Local Tables: > b1 > Fetch Operator > limit: -1 > Alias -> Map Local Operator Tree: > b1 > TableScan > alias: b1 > Statistics: Num rows: 10000 Data size: 478869 Basic stats: > COMPLETE Column stats: NONE > HashTable Sink Operator > condition expressions: > 0 {col0} {col1} {col2} {col3} {col4} {col5} {col6} > 1 {col0} > keys: > 0 col0 (type: string) > 1 col0 (type: string) > Stage: Stage-3 > Map Reduce > Map Operator Tree: > TableScan > alias: b2 > Statistics: Num rows: 10000 Data size: 478869 Basic stats: > COMPLETE Column stats: NONE > Map Join Operator > condition map: > Inner Join 0 to 1 > condition expressions: > 0 {col0} {col1} {col2} {col3} {col4} {col5} {col6} > 1 {col0} > keys: > 0 col0 (type: string) > 1 col0 (type: string) > outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, > _col6, _col9 > Statistics: Num rows: 11000 Data size: 526755 Basic stats: > COMPLETE Column stats: NONE > Filter Operator > predicate: (_col0 = _col9) (type: boolean) > Statistics: Num rows: 5500 Data size: 263377 Basic stats: > COMPLETE Column stats: NONE > Select Operator > expressions: _col0 (type: string), _col1 (type: string), > _col2 (type: string), _col3 (type: string), _col4 (type: string), _col5 > (type: string), _col6 (type: string) > outputColumnNames: _col0, _col1, _col2, _col3, _col4, > _col5, _col6 > Statistics: Num rows: 5500 Data size: 263377 Basic > stats: COMPLETE Column stats: NONE > File Output Operator > compressed: false > Statistics: Num rows: 5500 Data size: 263377 Basic > stats: COMPLETE Column stats: NONE > table: > input format: > org.apache.hadoop.mapred.TextInputFormat > output format: > org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat > serde: > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > Local Work: > Map Reduce Local Work > Stage: Stage-0 > Fetch Operator > limit: -1 > Time taken: 0.171 seconds, Fetched: 63 row(s) So we only see normal Map Join operator. *My question is :* - What operator will Bucket map join show in explain output? - The same as normal Map Join?If so, how do we know if the SQL is using bucket map join or normal map join? Thanks. Thanks, Hao --f46d044288962b74e3050829818a Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Hi All,

I am testing the different expl= ain plan output and found it hard to find the difference between=C2=A0Bucke= t Map Join(non sort merged) and normal Map Join.

T= he tested hive version is 0.13.
To enable Bucket Map Join, there = are 2 conditions:
1.=C2=A0set hive.optimize.bucketmapjoin =3D tru= e;
2. If the tables being joined are bucketized on the join colum= ns, and the number of buckets in one table is a multiple of the number of b= uckets in the other table, the buckets can be joined with each other.=C2=A0=

Here are my tests:
1. Create 2 = Bucketed Tables(non-sorted.)
create table b1(col0 string,col1= string,col2 string,col3 string,col4 string,col5 string,col6 string)
<= div>clustered by (col0) into 32 buckets;
create table b2(col0 str= ing,col1 string,col2 string,col3 string,col4 string,col5 string,col6 string= )
clustered by (col0) into 8 buckets;
set hive.enforce.= bucketing =3D true;=C2=A0
From passwords insert OVERWRITE =C2=A0t= able b1 select * limit 10000;
From passwords insert OVERWRITE =C2= =A0table b2 select * limit 10000;

2. set = hive.optimize.bucketmapjoin =3D true;
hive> explain=C2=A0 select=C2=A0 b1.* from b1,b2 where = b1.col0=3Db2.col0;=C2=A0 =C2=A0
OK
STAGE DEPENDENCIES:
=C2=A0 Stag= e-4 is a root stage
=C2=A0 Stage-3 depends on stages: Stage-4
=C2=A0 = Stage-0 is a root stage
STAGE PLANS:
=C2=A0 Stage: Stage-4
=C2=A0 = =C2=A0 Map Reduce Local Work
=C2=A0 =C2=A0 =C2=A0 Alias -> Map Local = Tables:
=C2=A0 =C2=A0 =C2=A0 =C2=A0 b1=C2=A0
=C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 Fetch Operator
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 limi= t: -1
=C2=A0 =C2=A0 =C2=A0 Alias -> Map Local Operator Tree:
=C2= =A0 =C2=A0 =C2=A0 =C2=A0 b1=C2=A0
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 Tab= leScan
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 alias: b1
=C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 Statistics: Num rows: 10000 Data size: 4788= 69 Basic stats: COMPLETE Column stats: NONE
=C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 HashTable Sink Operator
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 condition expressions:
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 0 {col0} {col1} {col2} {col3} {col4} {col5} {col6}=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 1 {col0}
=C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 keys:
=C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 0 col0 (type: string)
=C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 1 col0 (type: string)
=C2=A0 S= tage: Stage-3
=C2=A0 =C2=A0 Map Reduce
=C2=A0 =C2=A0 =C2=A0 Map Opera= tor Tree:
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 TableScan
=C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 alias: b2
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 Statistics: Num rows: 10000 Data size: 478869 Basic stats: COMPLETE= Column stats: NONE
=C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 Map Join Operator
=C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 condition map:
=C2=A0=C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 Inner Join 0 to 1
= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 condition expressions:
= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 0 {col0} {col1} {co= l2} {col3} {col4} {col5} {col6}
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 1 {col0}
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 keys:
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 0 col0= (type: string)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = 1 col0 (type: string)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 o= utputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col9=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 Statistics: Num rows: 110= 00 Data size: 526755 Basic stats: COMPLETE Column stats: NONE
=C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 Filter Operator
=C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 predicate: (_col0 =3D _col9) (type: = boolean)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 Statist= ics: Num rows: 5500 Data size: 263377 Basic stats: COMPLETE Column stats: N= ONE
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 Select Opera= tor
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 expre= ssions: _col0 (type: string), _col1 (type: string), _col2 (type: string), _= col3 (type: string), _col4 (type: string), _col5 (type: string), _col6 (typ= e: string)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6
= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 Statistics: = Num rows: 5500 Data size: 263377 Basic stats: COMPLETE Column stats: NONE=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 File Outpu= t Operator
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 compressed: false
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 Statistics: Num rows: 5500 Data size: 263377 Ba= sic stats: COMPLETE Column stats: NONE
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 table:
=C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 input format: o= rg.apache.hadoop.mapred.TextInputFormat
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 output format: org.apa= che.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
=C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 serde: org.= apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
=C2=A0 =C2=A0 =C2=A0 Loca= l Work:
=C2=A0 =C2=A0 =C2=A0 =C2=A0 Map Reduce Local Work
=C2=A0 Stag= e: Stage-0
=C2=A0 =C2=A0 Fetch Operator
=C2=A0 =C2=A0 =C2=A0 limit: -= 1
Time taken: 0.171 seconds, Fetched: 63 row(s)

So we only see normal Map Join operator.

= My question is :

  • What operator will Bucket= map join show in explain output?
  • The same as normal Map Join?I= f so, how do we know if the SQL is using bucket map join or normal map join= ?

Thanks.


Thanks,

Hao

--f46d044288962b74e3050829818a--