hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Guy Doulberg <Guy.Doulb...@perion.com>
Subject FW: Big table join optimization
Date Fri, 31 Jan 2014 00:07:00 GMT

hi guys

I am trying to optimize a hive join query, I have a join of two big tables. The join between
them is taking too long, no matter how many reducers I set, there are always two reducers
struggling to finish  in the end of  the job
The job not always ends, sometime it fails with memory problems

In the fast completed reducers I can see:
7688459 rows: used memory = 991337736

In the long running reducers:

43363436 rows: used memory = 1142368456


At first I thought  am dealing with  skew key, but I set the   hive.optimize.skewjoin to true,
and  it didn't change a thing, I played with  hive.skewjoin.key also didn't change a thing

Any other ideas I can try?

I am using hive 0.10 of CDH4.2.1

the source tables are using customized   serdes


Thanks
Guy Doulberg
Team leader @ Perion

Mime
View raw message