hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From goun na <gou...@gmail.com>
Subject Re: Hive ORC Table
Date Sun, 22 Jan 2017 10:56:23 GMT
Please refer the document below as well:

Hive on Tez Performance Tuning - Determining Reducer Counts
https://community.hortonworks.com/articles/22419/hive-on-tez-performance-tuning-determining-reducer.html
might

I hope it gives you some clue to understand Tez inside.

2017-01-21 23:35 GMT+09:00 Mahender Sarangam <Mahender.BigData@outlook.com>:

> Yes below option, i tried it, But I'm not sure about work load (data
> ingestion). I cant go with fixed hard coded value,I would like to know
> reason for getting 1009 reducer task.
>
> On 1/20/2017 7:45 PM, goun na wrote:
>
> Hi Mahender ,
>
> 1st :
> Didn't work the following option in Tez?
>
> set mapreduce.job.reduces=100
> or
> set mapred.reduce.tasks=100 (deprecated)
>
> 2nd :
> Possibility of data skew. It happens when handling null sometimes.
>
> Goun
>
>
> 2017-01-21 9:58 GMT+09:00 Mahender Sarangam <Mahender.BigData@outlook.com>
> :
>
>> Hi All,
>>
>> We have ORC table which is of 2 GB size. When we perform operation on
>> top of this ORC table, Tez always deduce 1009 reducer every time. I
>> searched 1009 is considered as Maximum value of number of Tez task. Is
>> there a way to reduce the number of reducer. I see file generated
>> underlying ORC some of them 500 MB or 1 GB etc. Is there way to
>> distribute file size to same value/same size.
>>
>>
>> My Second scenario, we have join on 5 tables all of them are left join.
>> Query goes fast till reached 99%. From 99% to 100% it takes too much
>> time. We are not involving our partition column as part of LEFT JOIN
>> Statement, Is there better way to resolving issues on 99% hanging
>> condition. My table is of 20 GB we are left joining with another table (
>> 9,00,00,000) records.
>>
>>
>> Mahens
>>
>>
>
>

Mime
View raw message