ambari-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Hurley" <>
Subject Re: Review Request 37161: Cluster creates fail on larger deployments with SQL Azure DB
Date Fri, 07 Aug 2015 18:38:33 GMT

This is an automatically generated e-mail. To reply, visit:

(Updated Aug. 7, 2015, 2:38 p.m.)

Review request for Ambari, Myroslav Papirkovskyy, Sumit Mohanty, and Sid Wagle.


The original patch had an error where it returned requests with tasks in progress. A few changes
were made to fix this problem:
- The original SQL was changed back so that it still does a nested SELECT
- The request for COMPLETED is only made now if the map is empty
- We are not actually doing the nested select from the executor anymore; since this map is
relatively small, we are retrieving the requests which may be cached and using their calculated

Bugs: AMBARI-12657

Repository: ambari


We started doing larger cluster creates (48 workernodes) with SQL Azure DB as an Ambari DB,
and we are seeing below HTTP GET requests timeout on the client side (even after retries),
resulting in cluster create failures (15%). This is a tracking Jira to resolve the CRUD failures.

What I’m seeing is that DB CPU usage goes above 50% in some of my experiments for 48 node
clusters. This might explain why SQL is running slow.

Basically, it’s this one query which consumes most of the CPU. Query plan is also attached.
SELECT DISTINCT t0.request_id FROM host_role_command t0 WHERE NOT EXISTS (SELECT @P0 FROM
host_role_command t1 WHERE (t1.status IN (@P1,@P2,@P3,@P4,@P5,@P6,@P7,@P8,@P9)))  ORDER BY
t0.request_id ASC

There's no need to do a JOIN on the same table here; we can eliminate the inner SELECT and
use a `NOT IN` clause.

Diffs (updated)

  ambari-server/src/main/java/org/apache/ambari/server/orm/dao/ a72e1fe




mvn clean test

Tests run: 3112, Failures: 0, Errors: 0, Skipped: 23

[INFO] ------------------------------------------------------------------------
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 01:02 h
[INFO] Finished at: 2015-08-05T21:21:52-04:00
[INFO] Final Memory: 29M/847M

Verified the new SQL works on all databases.


Jonathan Hurley

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message