hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bejoy KS " <bejo...@outlook.com>
Subject Fw: Cartesian Product in HIVE
Date Mon, 01 Oct 2012 04:30:32 GMT
Hi Abshiek

Both your tables are ideal candidates for map join.

Can you try a plain join statement without setting any properties other than num reducers
and a map join as the next step.

hive> set mapred.reduce.tasks=5;
hive>SELECT t2.col1,t3.col1FROM table2 t2JOIN table3 t3;

Once this goes well try doing map side join.
hive> set auto.convert.join=true;
hive>SELECT t2.col1,t3.col1FROM table2 t2JOIN table3 t3;

------Original Message------
From: Abhishek
To: user@hive.apache.org
Cc: user@hive.apache.org
Cc: Bejoy Ks
Subject: Re: Cartesian Product in HIVE
Sent: Oct 1, 2012 09:32

Thanks for the reply Bejoy. I did not any order by in the query. Here are the properities
I have used and query, table sizes -----  set mapred.reduce.tasks=17; set mapred.child.java.opts=xmx2073741824;
set io.sort.mb=512; set io.sort.factor=250; set mapred.reduce.parallel.copies=true; set mapred.job.reuse.jvm.num.tasks=1;
set hive.mapred.reduce.tasks.speculative.execution=false; set hive.mapred.map.tasks.speculative.execution=false;
CREATE TABLE t1 AS SELECT /*+ STREAMTABLE(t2) */ t2.col1, t3.col1 FROM table2 t2 JOIN table3
t3 table2 : 997406 rows total bytes: 20848934 -- 19.88 mb table3 : 20773 rows total bytes:
353127 -- 0.33 mb #of Mappers: 4 #of reducers: 1 Regards Abhi On Sep 30, 2012, at 9:35 AM,
Bejoy KS <bejoyks@outlook.com> wrote: Hi Abshiek No need of any similar columns for
map join to work. It is just taking the join process to mapper rather then  doing the same
in a reducer. The actual bottle neck is the single reducer. Need to figure out why only one
reducer is fired rather than the set value of 17. Are you using ORDER BY in your query? If
so, it sets the number of reducers to 1. Can you provide the full console stack here so that
we'll be able to understand your issue and help you better? (starting from the properties
you set, your query and the error ). Also can you get the exact data sizes for two tables.
Regards Bejoy KS > From: abhishek.dodda1@gmail.com > Date: Sat, 29 Sep 2012 07:44:06
-0700 > Subject: Re: Cartesian Product in HIVE > To: user@hive.apache.org; bejoy_ks@yahoo.com
> > Thanks for the reply Bejoy. > > I tried to map join, by setting the property
mentioned by you and Even > increased the small table file size > 20k table size would
be not more than 200 mb but it doesnot work. > > Cartesian product of tables, they dont
have any similar columns does > map join work here?? > > By applying below setting
with STREAM TABLE HINT it was processing > around 5 Billion rows per hour,so process took
around 4 hrs. > > Set io.sort.mb=512 > Set mapred.reduce.tasks=17 > Set io.sort.factor=256
> Set
Regards
Bejoy KS

Send from handheld, please excuse typos.

Mime
View raw message