hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bejoy Ks <bejoy...@yahoo.com>
Subject Re: how is number of mappers determined in mapside join?
Date Mon, 19 Mar 2012 12:48:09 GMT
Hi Bruce
      In map side join the smaller table is loader in memory and hence the number of mappers
is dependent only on the data on larger table. Say If CombineHiveInputFormat is used and
we have our hdfs block size as 32 mb, min split size as 1B and max split size as 256 mb. Which
means one mapper would be processing data chunks not less than 1B and not more than 256 MB.
So based on that mappers would be triggered, 
so a possibility in your case
mapper 1 - 200 MB
mapper 2 - 120 MB
mapper 3 - 140 MB
Every mapper is processing data whose size id between 1B and 256 MB. Totally of 460 MB, your
table size.

I'm not sure of the formula you posted here, Can you point me to the document from which you
got this?

Regards
Bejoy


________________________________
 From: Bruce Bian <weidong.ban@gmail.com>
To: user@hive.apache.org 
Sent: Monday, March 19, 2012 2:42 PM
Subject: how is number of mappers determined in mapside join?
 

Hi there,
when I'm executing the following queries in hive

set hive.auto.convert.join = true;
CREATE TABLE IDAP_ROOT as
SELECT a.*,b.acnt_no
FROM idap_pi_root a LEFT OUTER JOIN idap_pi_root_acnt b ON a.acnt_id=b.acnt_id

the number of mappers to run in the mapside join is 3, how is it determined? When launching
a job in hadoop mapreduce, i know it's determined by the function
max(Min split size, min(Max split size, HDFS blockSize)) which in my configuration is max(1B,
min(256MB ,32MB)=32MB and the two tables are 460MB and 1.5MB respectively.
Thus I thought the mappers to launch to be around 15, which is not the case.

Thanks
Bruce
Mime
View raw message