hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mahender Sarangam <Mahender.BigD...@outlook.com>
Subject Re: Hive ORC Table
Date Sat, 21 Jan 2017 14:35:20 GMT
Yes below option, i tried it, But I'm not sure about work load (data ingestion). I cant go
with fixed hard coded value,I would like to know reason for getting 1009 reducer task.

On 1/20/2017 7:45 PM, goun na wrote:
Hi Mahender ,

1st :
Didn't work the following option in Tez?

set mapreduce.job.reduces=100
or
set mapred.reduce.tasks=100 (deprecated)

2nd :
Possibility of data skew. It happens when handling null sometimes.

Goun


2017-01-21 9:58 GMT+09:00 Mahender Sarangam <Mahender.BigData@outlook.com<mailto:Mahender.BigData@outlook.com>>:
Hi All,

We have ORC table which is of 2 GB size. When we perform operation on
top of this ORC table, Tez always deduce 1009 reducer every time. I
searched 1009 is considered as Maximum value of number of Tez task. Is
there a way to reduce the number of reducer. I see file generated
underlying ORC some of them 500 MB or 1 GB etc. Is there way to
distribute file size to same value/same size.


My Second scenario, we have join on 5 tables all of them are left join.
Query goes fast till reached 99%. From 99% to 100% it takes too much
time. We are not involving our partition column as part of LEFT JOIN
Statement, Is there better way to resolving issues on 99% hanging
condition. My table is of 20 GB we are left joining with another table (
9,00,00,000) records.


Mahens



Mime
View raw message