hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rajesh Balamohan <rajesh.balamo...@gmail.com>
Subject Re: config recommendations to boost performance
Date Wed, 25 Feb 2015 10:43:59 GMT
>>
A query like "select name,count(id) from table where date='2015-01-01' or
date='2015-01-02' group by (name)" takes almost forever and needs to be
cancelled after ~30min.
>>

It should have ideally scanned only the 2 partitions. Do you see any
container launches after which you had to kill the job? Or is the split
computation itself taking more time?.

~Rajesh.B


On Wed, Feb 25, 2015 at 1:35 PM, Gerd K├Ânig <koenig.bodensee@googlemail.com>
wrote:

> Hi,
>
> I'm a bit stuck in optimizing the hive/tez config parameters to speed up
> Hive/Tez query execution.
> The cluster consists of 6 worker nodes (with rather hadoop-non-ideal
> component proportion, but that's given) including: 48Cores/384GB Ram/10HDDs.
> The Hive table is configured as:
> - partitioned by day
> - 12 buckets (bucketed on a smallint column)
> - transactional=true
> - snappy compressed ORC format
> and it contains about 200TB of data.
> Every 5 minutes new arrived data will be inserted (if any), this, of
> course, leads to a potential high number of delta-files.
>
> A query like "select name,count(id) from table where date='2015-01-01' or
> date='2015-01-02' group by (name)" takes almost forever and needs to be
> cancelled after ~30min.
>
> Of course, Hive will never be a performance beast, but by executing with
> Tez I hoped to get much better performance...
>
> Some current settings:
> yarn.nodemanager.resource.memory-mb : 304640
> yarn.scheduler.minimum-allocation-mb : 15360
> mapreduce.map.memory.mb : 20480
> mapreduce.reduce.memory.mb : 25600
> mapreduce.map.java.opts : -Xmx12288m
> mapreduce.reduce.java.opts : -Xmx15360m
> Set hive.execution.engine=tez;
> set tez.queue.name=highresourcequeue;
> set tez.am.grouping.min-size= 268435456;
> set hive.exec.reducers.max=6;
> set mapreduce.job.reduces=6;
>
>
> My thoughts are:
> - improve the data ingestion to reduce the number of delta-files and
> thereby reduce the number of mappers being required
> - improve the settings for the automatic compaction to further reduce the
> number of files, no. of mappers resp.
> - YARN config should be o.k., see properties above
>
> What are the main Tez/Hive properties to check/adjust that could improve
> the performance in the given environment ?!?!
>
> Many thanks in advance, G.
>



-- 
~Rajesh.B

Mime
View raw message