impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sailesh Mukil (Code Review)" <>
Subject [Impala-ASF-CR] IMPALA-4456: Address scalability issues of qs map lock and client request state map lock
Date Mon, 23 Oct 2017 22:13:26 GMT
Sailesh Mukil has uploaded a new patch set (#3). ( )

Change subject: IMPALA-4456: Address scalability issues of qs_map_lock_ and client_request_state_map_lock_

IMPALA-4456: Address scalability issues of qs_map_lock_ and client_request_state_map_lock_

The following 2 locks have shown to be frequent points of contention
on recent perf runs:

- qs_map_lock_
- client_request_state_map_lock_

Since these are process wide locks, any threads waiting on these locks
potentially slow down the runtime of a query.

I tried to address this previously by converting the client_request_state_map_lock_
to a reader-writer lock. This showed great perf improvements in the general
case, however, there were edge cases with big regressions as well.
In the general case, strict readers of the map got through so quickly
that we were able to see a reduction in the number of client connections
created, since this lock was contended for in the context of Thrift threads too.
The bad case is when writers were starved trying to register a new query
since there were so many readers. Changing the starve option resulted in
worse read performance all round.

Another approach which is relatively simpler is to shard the locks, which
proves to be very effective with no regressions. The maps and locks are
shareded to a default of 4 buckets initally.

Query IDs are created by using boost::uuids::random_generator. We use the
high bits of a query ID to assign queries to buckets. I verified that the
distribution of the high bits of a query ID are even across buckets on
my local machine:

For 10,000 queries sharded across 4 buckets, the distribution was:
bucket[0]: 2500
bucket[1]: 2489
bucket[2]: 2566
bucket[3]: 2445

A micro-benchmark is added to measure the improvement in performance. This
benchmark creates multiple threads each of which creates a QueryState and
accesses it multiple times. We can see improvements in the range 2x - 3.5x.

------------------Benchmark 1: Create and access Query States.
Total Time (#Queries: 5 #Accesses: 100) : 1ms
Total Time (#Queries: 50 #Accesses: 100) : 8ms
Total Time (#Queries: 50 #Accesses: 1000) : 54ms
Total Time (#Queries: 500 #Accesses: 100) : 76ms
Total Time (#Queries: 500 #Accesses: 1000) : 543ms

------------------Benchmark 1: Create and access Query States.
Total Time (#Queries: 5 #Accesses: 100) : 2173.59K clock cycles
Total Time (#Queries: 50 #Accesses: 100) : 4ms
Total Time (#Queries: 50 #Accesses: 1000) : 15ms
Total Time (#Queries: 500 #Accesses: 100) : 46ms
Total Time (#Queries: 500 #Accesses: 1000) : 151ms

TODO: Add benchmark for client_request_state_map_lock_ too. The APIs
around that are more complicated, so this patch only includes
the benchmarking of qs_map_lock_.

Change-Id: I61089090e1095da45a8a64ed3ccc78bd310807f1
M be/src/benchmarks/CMakeLists.txt
A be/src/benchmarks/
M be/src/runtime/
M be/src/runtime/query-exec-mgr.h
M be/src/service/
M be/src/service/
M be/src/service/impala-server.h
M be/src/util/uid-util.h
8 files changed, 236 insertions(+), 47 deletions(-)

  git pull ssh:// refs/changes/63/8363/3
To view, visit
To unsubscribe, visit

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I61089090e1095da45a8a64ed3ccc78bd310807f1
Gerrit-Change-Number: 8363
Gerrit-PatchSet: 3
Gerrit-Owner: Sailesh Mukil <>

  • Unnamed multipart/alternative (inline, 8-Bit, 0 bytes)
View raw message