impala-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Peter Ebert (JIRA)" <>
Subject [jira] [Created] (IMPALA-5701) Additional mem_limit settings for resource management (min mem, max before spill, max spill)
Date Sun, 23 Jul 2017 23:11:00 GMT
Peter Ebert created IMPALA-5701:

             Summary: Additional mem_limit settings for resource management (min mem, max
before spill, max spill)
                 Key: IMPALA-5701
             Project: IMPALA
          Issue Type: Improvement
          Components: Backend
    Affects Versions: Impala 2.9.0
            Reporter: Peter Ebert

I would like to propose adding some alternative controls to the current mem_limit in admission
control. [1] a minimum memory that each query should get for startup [2] a maximum memory
a query may take before spilling [3] a maximum spill size so that one query can not take up
all spill space.

Problem statement: For ad-hoc and exploratory workloads memory use is rarely known in advance
and failures due to out of memory present a poor user experience as the average user has to
try and retry with increasing memory limits, then when they issue their next small query it's
likely their increased memory limit will reduce the concurrency for others in the pool (e.g.
if they set the mem_limit to 20GB for their last query, but need only 3GB for their next query).

Current state:
As an example, lets say you have a pool with 60GB of RAM and the average query uses less than
3GB with some queries using 20GB.

So for the example workload above in the current state you either choose 3GB and have a concurrency
of 20 queries, or set it to 20GB to keep large queries from failing, and have only 3 queries
for the 60GB pool.  Any user manually increasing their limit to 20GB would greatly reduce
the other number of queries that can be run (1user*20GB+13users*3GB=>14 queries at once).

With additional settings: If you have the 3 settings mentioned above, you could set the minimum
memory to 1GB, (assuming this is enough for spilling to occur), this would prevent a query
from failing to startup/spill when usage is at 59.5/60GB.  This means you could theoretically
have 60 queries instead of 20 at a time (in the previous example with mem_limit) if they were
all relatively small queries.  Then to avoid single 'big' queries from taking up all the memory
you would set a max mem at which point spilling would occur, very similar to the current mem_limit.
 In our case that could be 20GB and all queries within that limit would finish quickly without
spilling, and the one 'bad' query would only reduce the total concurrency of impala to 41
(1user*20GB+40users*1GB).  Finally to avoid that 'bad' query from using up all the spill space
a limit would be set for the max amount of spill.

I'm sure this is not the only solution, nor the best one, but wanted to share the idea as
I believe it would greatly improve the user experience.

This message was sent by Atlassian JIRA

View raw message