impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dan Hecht (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-1972/IMPALA-3882: Fix client request state map lock contention
Date Fri, 19 May 2017 21:37:51 GMT
Dan Hecht has posted comments on this change.

Change subject: IMPALA-1972/IMPALA-3882: Fix client_request_state_map_lock_ contention
......................................................................


Patch Set 7:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/6707/7/be/src/service/impala-beeswax-server.cc
File be/src/service/impala-beeswax-server.cc:

PS7, Line 296: NULL
nit: change that one too to at least keep functions consistent


http://gerrit.cloudera.org:8080/#/c/6707/7/be/src/service/impala-http-handler.cc
File be/src/service/impala-http-handler.cc:

PS7, Line 721: just return
that's not what the code does (it also sets plan_metadata_unavailable), please rephrase. 
Could rephrase the whole comment as: 

If the query plan isn't generated, avoid waiting for the lock, which could take a while if
catalog metadata is being loaded.


PS7, Line 730: adopt_lock_t
shouldn't that be deleted?


http://gerrit.cloudera.org:8080/#/c/6707/7/tests/custom_cluster/test_query_concurrency.py
File tests/custom_cluster/test_query_concurrency.py:

PS7, Line 32: The intention here is to check contention on the query_exec_state_map_lock_
This is talking about how the old code worked, which won't make sense to people reading the
current code (after this change). It should say something like:

The intention is to check that the webserver does not hold any global locks or otherwise prevent
impalad from servicing new requests.


PS7, Line 54: This creates lock contention on the coordinator by
            :     calling QuerySummaryHandler() method
This is no longer true with your fix. How about saying:

This is to verify that QuerySummaryHandler() does not hold any global locks that would, for
example, prevent another query from starting.


PS7, Line 74: time.sleep(2)
I'm worried that this will be flaky, especially with ASAN.  Instead of this delay, couldn't
we just wait for in_flight_queries to become 1?  And you could use the parameter to get_in_flight_queries()
to do that by passing some largish value. That has the advantage that we'll wait only as long
as necessary for the value to change to 1, so we can have a relatively long timeout (rather
than delay).


PS7, Line 83: time.sleep(2)
this delay is a bit harder to eliminate.  How about we increase --stress_metadata_loading_pause_injection_ms
to something really large, say 1000 seconds (which doesn't matter -- we don't actually need
the queries to finish planning to end the test, right?). 

And then we can use a larger timeout here, but we don't need to delay for it. We can just
do:

inflight_query_ids = impalad.service.get_in_flight_queries(30)

which will poll the webui once per second and give up after 30 seconds.


-- 
To view, visit http://gerrit.cloudera.org:8080/6707
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie44daa93e3ae4d04d091261f3ec4891caffe8026
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada <bharathv@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bharathv@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhecht@cloudera.com>
Gerrit-Reviewer: Henry Robinson <henry@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message