impala-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From 廖松博 <liaoson...@gridsum.com>
Subject RE: Impala user resource isolation best practice
Date Mon, 18 Jul 2016 23:29:21 GMT
Hi Matthew,
	Thanks for your reply. The point is , per admission control documents, most of impala limits
are "soft limit", are the 2 settings you mentioned also "soft limit" ? The soft limit means
the pool will exceed the memory/concurrency limit at some moment when impala is not aware
of. But it is affecting other pool at that moment. 
	Thanks.

Songbo

-----邮件原件-----
发件人: Matthew Jacobs [mailto:mj@cloudera.com] 
发送时间: 2016年7月19日 0:44
收件人: user@impala.incubator.apache.org
主题: Re: Impala user resource isolation best practice

By the way, some of the controls I mentioned were added in Impala 2.5, so you should consider
upgrading if you're not already using a newer version of Impala.

Thanks,
Matt

On Mon, Jul 18, 2016 at 9:20 AM, Matthew Jacobs <mj@cloudera.com> wrote:
> Hi Songbo,
>
> Right now the best you can do is with admission control with:
> (a) a single coordinator to avoid the possibility of over-admitting by 
> different coordinators
> (b) setting default query mem limits so that individual queries are 
> limited
>
> For your scenario, I'd recommend setting up 2 pools, one for user A 
> and a second for user B. Set the max number of running queries for 
> user A to something reasonable for the concurrency for that workload.
> Set the max memory for the user B pool to the portion of cluster 
> memory you're willing to give to those queries. (Notice the pool with 
> the small queries has the max number of running queries set and the 
> pool with the fewer but larger big queries has the max memory set -- 
> that is intentional, the former is faster for admission but doesn't 
> limit based on memory.) How well this will work depends on how well 
> you can pick good numbers for these settings, which can be difficult 
> and requires studying your workload.
>
> This isn't perfect resource isolation because rogue queries can still 
> consume too much CPU or other resources, but it's the best you'll be 
> able to do right now. In the future we will have better tools to make 
> this easier.
>
> Best,
> Matt
>
> On Mon, Jul 18, 2016 at 2:59 AM, 廖松博 <liaosongbo@gridsum.com> wrote:
>> Hello guys,
>>
>>
>>
>>        My Company is using Cloudera Impala as our basic 
>> infrastructure for online data analysis. The most difficult part we 
>> met is resource isolation and instability.
>>
>> According to our experiences in Impala, some big query which consume 
>> a vast amount of memory will crash impalad process(actually as worker 
>> but not coordinator, right?).
>>
>> In our simplest scenario, user A is a very important customer and his 
>> queries are relatively small, user B is a unimportant user who may 
>> issue very large SQL to impala. It is unacceptable that the big query 
>> from user B crash the impalad process and affect the user experiences 
>> of user A. So resource isolation is the point.
>>
>> But per the Impala documents :
>> http://www.cloudera.com/documentation/enterprise/5-6-x/topics/impala_
>> admission.html , Impala resource isolation is soft limit, cannot 
>> strictly prevent query from user B affecting user A.
>>
>> As I know llama(run impala with yarn) is not recommended and we 
>> actually tried it but disappointed about the performance and accuracy.
>>
>>        Is there any best practice for user resource isolation? So 
>> different user will not affect each other.
>>
>>        Thanks.
>>
>>
>>
>> Best Regards,
>>
>> Songbo
Mime
View raw message