impala-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tim Armstrong <tarmstr...@cloudera.com>
Subject Re: Re: Re: Configuration for Admission Control
Date Mon, 29 Jan 2018 16:26:25 GMT
Yeah, setting a mem_limit based on the workload is currently our
recommended best practice. I've been thinking about how to make this easier
to set up.

On Thu, Jan 25, 2018 at 7:19 PM, Quanlong Huang <huang_quanlong@126.com>
wrote:

> Hi Jeszy, thanks for your reply!
>
> 1. We may choose to set the pool's mem_limit as 0 at first, to make all users comfortable.
After we collect enough performance metrics, we may be able to come up with a suitable mem_limit.
>
> 2. It's a pity for that...
>
> Thanks,
> Quanlong
>
> At 2018-01-25 15:58:40, "Jeszy" <jeszyb@gmail.com> wrote:
> >Hey Quanlong,
> >
> >1. Impala estimates the memory usage at planning time, and runtime
> >statistics for a specific run aren't reused on subsequent runs, so the
> >estimate changes only when the plan changes, or when statistics
> >change. Also, estimates are often wrong (usually overestimating). The
> >'mem_limit' query option will override estimates, it's a good practice
> >to apply it at the pool level, so you can get deterministic
> >concurrency. This can be difficult though, as it requires you to
> >assign queries to pools based on memory usage / allowance.
> >
> >2. No, this isn't possible currently.
> >
> >HTH!
> >
> >On 25 January 2018 at 07:43, Quanlong Huang <huang_quanlong@126.com> wrote:
> >> Thanks, Tim!
> >>
> >> 1. The soft limit is exactly what we want. I have another question that how
> >> does Impala estimate the memory usage of a query? It seems that it won't
> >> change the estimation even after the query run again and the actual usage is
> >> much smaller than the estimate.
> >>
> >> 2. I think we need a template for configuration. Something like Presto
> >> provided: https://prestodb.io/docs/current/admin/queue.html. Every new user
> >> corresponds to a new pool. The admin doesn't need to create a pool manually
> >> for him/her. For detailed limit types, I think more are welcome. Currently,
> >> we need the two assumptions:
> >>   a. Each user can run no more than 5 queries in parallel.
> >>   b. The total amount of queries running in parallel of the whole system
> >> should no more than 20.
> >> Does it seem that I can't config this right now?
> >>
> >> Thanks,
> >> Quanlong
> >>
> >>
> >> At 2018-01-25 02:49:33,"Tim Armstrong" <tarmstrong@cloudera.com> wrote:
> >>
> >> Hi Quanlong,
> >>
> >>  1. Admission control memory limits for pools actually behave as soft limits
> >> already - admission control won't kill queries if the pool's limits are
> >> exceeded. It is limited to admitting/queueing/rejecting a query.
> >>
> >> The hard limits are the query and process memory limits. If an individual's
> >> query mem_limit is exceeded, it will be killed, or if the Impala daemon
> >> process's total memory limit is killed, queries will be killed until it gets
> >> under the limit.
> >>
> >> In the short-to-medium term we're working to avoid that kind of
> >> out-of-memory as much as possible. If a query can't run with a given
> >> mem_limit, it shouldn't be admitted. If it is admitted, it should regulate
> >> its own memory consumption by spilling to disk, etc, to stay under the
> >> mem_limit. We have a lot of pieces for that already (e.g. the big revamp of
> >> spill-to-disk in IMPALA-3200 and the HDFS scanner patches I have out for
> >> review right now).
> >>
> >> 2. We don't support this right now, but that is a very good idea. I'm not
> >> sure what exactly the right policy is. Maybe limiting each user to a fixed
> >> number of queries is reasonable, or maybe there should also be some kind of
> >> fairness (e.g. a user can't consume more than x% of the remaining resources
> >> in the pool). Would be interested in your thoughts.
> >>
> >> - Tim
> >>
> >> On Wed, Jan 24, 2018 at 5:53 AM, Quanlong Huang <huang_quanlong@126.com>
> >> wrote:
> >>>
> >>> Hi all,
> >>>
> >>> We're going to use Admission Control to support multi-tenancy. I have
> >>> several questions about the configuration:
> >>>
> >>> 1. Is there a config about the soft memory limit of a queue? i.e. when
> >>> queries in a pool totally consumed much amount of memory than the soft
> >>> limit, they won't fail directly but the queries submitted later for this
> >>> pool will be queued.
> >>>
> >>> 2. Can we config that the max concurrent running queries for each user
> >>> should no more than a limit (e.g. 10)? Currently, I have to create a pool
> >>> for a user to do this. This is not scalable if we have tens of users. And
we
> >>> have to add a new pool for each new user.
> >>>
> >>> Thanks,
> >>> Quanlong
> >>>
> >>>
> >>>
> >>
> >>
> >>
> >>
> >>
>
>
>
>
>

Mime
View raw message