hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ranjan Banerjee <>
Subject RE: Oversized container estimation
Date Sat, 26 Nov 2016 02:01:20 GMT
Hi Rajesh,
   Thanks a lot for the insight. When you mean CPU are you referring to the vcore of yarn?
The yarn min container size(yarn-scheduler.minimum.allocation.mb) is set to 1.5GB and the
minimum cores per container(yarn-scheduler-minimum.allocation.vcores) is set to 1.

Are u saying that if the number of container to vcore ratio is not 1:1 then merely increasing
number of containers will not help as each container will not get the vcore at the same time
to process the task.

Thanks for the help!!


-----Original Message-----
From: Rajesh Balamohan [] 
Sent: Friday, November 25, 2016 5:40 PM
Subject: Re: Oversized container estimation

Those are cumulative figures in the DAG level. You may want to check the gc logs emitted at
task level to check the details on whether complete memory is used or not. Not sure what is
the yarn-min container size specified in your cluster. But based on that, you may run into
the risk of running too many containers in same node by lowering the container size (e.g 49
containers in 98 GB machine with 2 GB as hive container size & yarn min-container size.
If you have only 32 CPU in your system, this would end up over subscribing a lot and could
adversely impact job performance).


On Fri, Nov 25, 2016 at 11:03 PM, Ranjan Banerjee <>

> Hi everyone,
> I have a cluster where each container is configured at 4GB and some of 
> my queries are getting over in 30 to 40 seconds. This leads me to 
> believe that I have too much memory for my containers and I am 
> thinking of reducing the container size to 
> 1.5GB(hive.tez.container.size) but I am looking for a few more concrete data points to
find out if really I have oversized containers?
> I looked into the tez view of my DAG and the counters give me:
> VIRTUAL_MEMORY_BYTES 1560263561216
> I am guessing this is wrong as there is no way the query could finish 
> in
> 20 seconds on a 98GB cluster if the actual memory required by the 
> query is 907GB. Any help to find some data points regarding 
> determination of oversized containers is very much appreciated!
> Thanks
> Ranjan

View raw message