ambari-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Greg Senia (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (AMBARI-9334) Ambari StageDAO.findByCommandStatuses causes Postgress HIGH CPU
Date Wed, 04 Feb 2015 16:04:36 GMT

    [ https://issues.apache.org/jira/browse/AMBARI-9334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14305374#comment-14305374
] 

Greg Senia commented on AMBARI-9334:
------------------------------------

My concern is the fact that the original query which came from the postgres logging showed
the query with the nested select... And as you can see by multiple outputs the query returns
the same result. so what you are saying is the original query must be broken also because
the distinct is clearly working correctly and returning the same result as the nested select...

ambari=> \d+ stage
                              Table "ambari.stage"
      Column       |          Type          | Modifiers | Storage  | Description 
-------------------+------------------------+-----------+----------+-------------
 stage_id          | bigint                 | not null  | plain    | 
 request_id        | bigint                 | not null  | plain    | 
 cluster_id        | bigint                 | not null  | plain    | 
 log_info          | character varying(255) | not null  | extended | 
 request_context   | character varying(255) |           | extended | 
 cluster_host_info | bytea                  | not null  | extended | 
Indexes:
    "stage_pkey" PRIMARY KEY, btree (stage_id, request_id)
Foreign-key constraints:
    "fk_stage_request_id" FOREIGN KEY (request_id) REFERENCES request(request_id)
Referenced by:
    TABLE "host_role_command" CONSTRAINT "fk_host_role_command_stage_id" FOREIGN KEY (stage_id,
request_id) REFERENCES stage(stage_id, request_id)
    TABLE "role_success_criteria" CONSTRAINT "role_success_criteria_stage_id" FOREIGN KEY
(stage_id, request_id) REFERENCES stage(stage_id, request_id)
Has OIDs: no



ambari=> SELECT distinct t0.stage_id, t0.cluster_host_info, t0.cluster_id, t0.log_info,
t0.request_context, t0.request_id FROM stage t0, host_role_command t1 WHERE ((t0.request_id
= t1.request_id) AND t0.stage_id =t1.stage_id and (t1.status IN ('QUEUED','IN_PROGRESS','PENDING')))
ORDER BY t0.stage_id,t0.request_id;
 stage_id |                                                                              
                                          
                                                                                         
                                          
                                                                                         
                                          
                                                cluster_host_info                        
                                          
                                                                                         
                                          
                                                                                         
                                          
                                                                                         
            | cluster_id |    log_info    
|      request_context      | request_id 
----------+-------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------+------------+----------------
+---------------------------+------------
        1 | {"hs_host":["3"],"namenode_host":["5"],"snamenode_host":["3"],"zookeeper_hosts":["5-7"],"hbase_rs_hosts":["0-2","4","8"]
,"drpc_server_hosts":["7"],"supervisor_hosts":["0-2","4","8"],"ganglia_monitor_hosts":["0-8"],"rm_host":["3"],"slave_hosts":["0-2","
4","8"],"ambari_server_host":["xlab370.example.com"],"all_ping_ports":["8670:0-8"],"hive_metastore_hosts":["6"],"storm_rest_api_hosts
":["7"],"falcon_server_hosts":["5"],"hbase_master_hosts":["7"],"ganglia_server_host":["3"],"nm_hosts":["0-2","4","8"],"hive_server_h
ost":["6"],"hive_mysql_host":["6"],"all_hosts":["xlab362.example.com","xlab365.example.com","xlab366.example.com","xlab370.example.com",
"xlab364.example.com","xlab369.example.com","xlab368.example.com","xlab367.example.com","xlab363.example.com"],"oozie_server":["3"],"stor
m_ui_server_hosts":["7"],"webhcat_server_host":["6"],"nagios_server_host":["3"],"nimbus_hosts":["7"]}
|          2 | /tmp/ambari:59 
| _PARSE_.STOP.ALL_SERVICES |         59
        2 | {"hs_host":["3"],"namenode_host":["5"],"snamenode_host":["3"],"zookeeper_hosts":["5-7"],"hbase_rs_hosts":["0-2","4","8"]
,"drpc_server_hosts":["7"],"supervisor_hosts":["0-2","4","8"],"ganglia_monitor_hosts":["0-8"],"rm_host":["3"],"slave_hosts":["0-2","
4","8"],"ambari_server_host":["xlab370.example.com"],"all_ping_ports":["8670:0-8"],"hive_metastore_hosts":["6"],"storm_rest_api_hosts
":["7"],"falcon_server_hosts":["5"],"hbase_master_hosts":["7"],"ganglia_server_host":["3"],"nm_hosts":["0-2","4","8"],"hive_server_h
ost":["6"],"hive_mysql_host":["6"],"all_hosts":["xlab362.example.com","xlab365.example.com","xlab366.example.com","xlab370.example.com",
"xlab364.example.com","xlab369.example.com","xlab368.example.com","xlab367.example.com","xlab363.example.com"],"oozie_server":["3"],"stor
m_ui_server_hosts":["7"],"webhcat_server_host":["6"],"nagios_server_host":["3"],"nimbus_hosts":["7"]}
|          2 | /tmp/ambari:59 
| _PARSE_.STOP.ALL_SERVICES |         59
        3 | {"hs_host":["3"],"namenode_host":["5"],"snamenode_host":["3"],"zookeeper_hosts":["5-7"],"hbase_rs_hosts":["0-2","4","8"]
,"drpc_server_hosts":["7"],"supervisor_hosts":["0-2","4","8"],"ganglia_monitor_hosts":["0-8"],"rm_host":["3"],"slave_hosts":["0-2","
4","8"],"ambari_server_host":["xlab370.example.com"],"all_ping_ports":["8670:0-8"],"hive_metastore_hosts":["6"],"storm_rest_api_hosts
":["7"],"falcon_server_hosts":["5"],"hbase_master_hosts":["7"],"ganglia_server_host":["3"],"nm_hosts":["0-2","4","8"],"hive_server_h
ost":["6"],"hive_mysql_host":["6"],"all_hosts":["xlab362.example.com","xlab365.example.com","xlab366.example.com","xlab370.example.com",
"xlab364.example.com","xlab369.example.com","xlab368.example.com","xlab367.example.com","xlab363.example.com"],"oozie_server":["3"],"stor
m_ui_server_hosts":["7"],"webhcat_server_host":["6"],"nagios_server_host":["3"],"nimbus_hosts":["7"]}
|          2 | /tmp/ambari:59 
| _PARSE_.STOP.ALL_SERVICES |         59
(3 rows)

ambari=> SELECT t0.stage_id, t0.cluster_host_info, t0.cluster_id, t0.log_info, t0.request_context,
t0.request_id FROM stage t0 WHERE t0.stage_id IN (SELECT t1.stage_id FROM host_role_command
t1 WHERE ((t0.request_id = t1.request_id) AND (t1.status IN ('QUEUED','IN_PROGRESS','PENDING'))))
ORDER BY t0.request_id, t0.stage_id;  
 stage_id |                                                                              
                                          
                                                                                         
                                          
                                                                                         
                                          
                                                cluster_host_info                        
                                          
                                                                                         
                                          
                                                                                         
                                          
                                                                                         
            | cluster_id |    log_info    
|      request_context      | request_id 
----------+-------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------+------------+----------------
+---------------------------+------------
        1 | {"hs_host":["3"],"namenode_host":["5"],"snamenode_host":["3"],"zookeeper_hosts":["5-7"],"hbase_rs_hosts":["0-2","4","8"]
,"drpc_server_hosts":["7"],"supervisor_hosts":["0-2","4","8"],"ganglia_monitor_hosts":["0-8"],"rm_host":["3"],"slave_hosts":["0-2","
4","8"],"ambari_server_host":["xlab370.example.com"],"all_ping_ports":["8670:0-8"],"hive_metastore_hosts":["6"],"storm_rest_api_hosts
":["7"],"falcon_server_hosts":["5"],"hbase_master_hosts":["7"],"ganglia_server_host":["3"],"nm_hosts":["0-2","4","8"],"hive_server_h
ost":["6"],"hive_mysql_host":["6"],"all_hosts":["xlab362.example.com","xlab365.example.com","xlab366.example.com","xlab370.example.com",
"xlab364.example.com","xlab369.example.com","xlab368.example.com","xlab367.example.com","xlab363.example.com"],"oozie_server":["3"],"stor
m_ui_server_hosts":["7"],"webhcat_server_host":["6"],"nagios_server_host":["3"],"nimbus_hosts":["7"]}
|          2 | /tmp/ambari:59 
| _PARSE_.STOP.ALL_SERVICES |         59
        2 | {"hs_host":["3"],"namenode_host":["5"],"snamenode_host":["3"],"zookeeper_hosts":["5-7"],"hbase_rs_hosts":["0-2","4","8"]
,"drpc_server_hosts":["7"],"supervisor_hosts":["0-2","4","8"],"ganglia_monitor_hosts":["0-8"],"rm_host":["3"],"slave_hosts":["0-2","
4","8"],"ambari_server_host":["xlab370.example.com"],"all_ping_ports":["8670:0-8"],"hive_metastore_hosts":["6"],"storm_rest_api_hosts
":["7"],"falcon_server_hosts":["5"],"hbase_master_hosts":["7"],"ganglia_server_host":["3"],"nm_hosts":["0-2","4","8"],"hive_server_h
ost":["6"],"hive_mysql_host":["6"],"all_hosts":["xlab362.example.com","xlab365.example.com","xlab366.example.com","xlab370.example.com",
"xlab364.example.com","xlab369.example.com","xlab368.example.com","xlab367.example.com","xlab363.example.com"],"oozie_server":["3"],"stor
m_ui_server_hosts":["7"],"webhcat_server_host":["6"],"nagios_server_host":["3"],"nimbus_hosts":["7"]}
|          2 | /tmp/ambari:59 
| _PARSE_.STOP.ALL_SERVICES |         59
        3 | {"hs_host":["3"],"namenode_host":["5"],"snamenode_host":["3"],"zookeeper_hosts":["5-7"],"hbase_rs_hosts":["0-2","4","8"]
,"drpc_server_hosts":["7"],"supervisor_hosts":["0-2","4","8"],"ganglia_monitor_hosts":["0-8"],"rm_host":["3"],"slave_hosts":["0-2","
4","8"],"ambari_server_host":["xlab370.example.com"],"all_ping_ports":["8670:0-8"],"hive_metastore_hosts":["6"],"storm_rest_api_hosts
":["7"],"falcon_server_hosts":["5"],"hbase_master_hosts":["7"],"ganglia_server_host":["3"],"nm_hosts":["0-2","4","8"],"hive_server_h
ost":["6"],"hive_mysql_host":["6"],"all_hosts":["xlab362.example.com","xlab365.example.com","xlab366.example.com","xlab370.example.com",
"xlab364.example.com","xlab369.example.com","xlab368.example.com","xlab367.example.com","xlab363.example.com"],"oozie_server":["3"],"stor
m_ui_server_hosts":["7"],"webhcat_server_host":["6"],"nagios_server_host":["3"],"nimbus_hosts":["7"]}
|          2 | /tmp/ambari:59 
| _PARSE_.STOP.ALL_SERVICES |         59
(3 rows)

> Ambari StageDAO.findByCommandStatuses causes Postgress HIGH CPU
> ---------------------------------------------------------------
>
>                 Key: AMBARI-9334
>                 URL: https://issues.apache.org/jira/browse/AMBARI-9334
>             Project: Ambari
>          Issue Type: Bug
>          Components: ambari-server
>    Affects Versions: 1.6.0, 1.6.1, 1.7.0
>         Environment: RHEL 6.4/6.5
> postgresql-server-8.4.13-1.el6_3.x86_64
> postgresql-8.4.13-1.el6_3.x86_64
> postgresql-libs-8.4.13-1.el6_3.x86_64
> ambari-agent-1.6.1-98.x86_64
> ambari-log4j-1.6.1.98-1.noarch
> ambari-server-1.6.1-98.noarch
>            Reporter: Greg Senia
>            Assignee: Jonathan Hurley
>            Priority: Critical
>             Fix For: 2.0.0
>
>         Attachments: AMBARI-9334.patch, AMBARI-9334.patch.2, Screen Shot 2015-01-26 at
12.18.56 PM.png, Screen Shot 2015-01-26 at 12.19.26 PM.png, StageDAO-1.6.1.patch, StageDAO.java
>
>
> The following code that generates a query causes postgres to use lots of CPU espcially
if the Ambari DB grows over time. We reduced CPU by 30-40% by fixing the code below.
> Before:
> public List<StageEntity> findByCommandStatuses(Collection<HostRoleStatus>
statuses) {
> TypedQuery<StageEntity> query = entityManagerProvider.get().createQuery("SELECT
stage " +
> "FROM StageEntity stage WHERE stage.stageId IN (SELECT hrce.stageId FROM " +
> "HostRoleCommandEntity hrce WHERE stage.requestId = hrce.requestId and hrce.status IN
?1 ) " +
> "ORDER BY stage.requestId, stage.stageId", StageEntity.class);
> return daoUtils.selectList(query, statuses);
> }
> After:
> @RequiresSession
> public List<StageEntity> findByCommandStatuses(Collection<HostRoleStatus>
statuses) {
> TypedQuery<StageEntity> query = entityManagerProvider.get().createQuery("SELECT
stage "+
> "FROM StageEntity stage, HostRoleCommandEntity hrce " +
> "WHERE stage.requestId = hrce.requestId AND stage.stageId = hrce.stageId and hrce.status
IN ?1 " +
> "ORDER BY stage.requestId, stage.stageId", StageEntity.class);
> return daoUtils.selectList(query, statuses);
> }
> Before EXPLAIN ANALYZE:
> ambari=> explain analyze SELECT t0.stage_id, t0.cluster_host_info, t0.cluster_id,
t0.log_info, t0.request_context, t0.request_id FROM stage t0 WHERE t0.stage_id IN (SELECT
t1.stage_id FROM host_role_command t1 WHERE ((t0.request_id = t1.request_id) AND (t1.status
IN ('QUEUED','IN_PROGRESS','PENDING')))) ORDER BY t0.request_id, t0.stage_id
> ;
>                                                              QUERY PLAN             
                                              
> ------------------------------------------------------------------------------------------------------------------------------------
> Sort  (cost=7407488.50..7407492.69 rows=1676 width=894) (actual time=55418.086..55418.086
rows=0 loops=1)
>    Sort Key: t0.request_id, t0.stage_id
>    Sort Method:  quicksort  Memory: 25kB
>    ->  Seq Scan on stage t0  (cost=0.00..7407398.75 rows=1676 width=894) (actual time=55418.081..55418.081
rows=0 loops=1)
>          Filter: (SubPlan 1)
>          SubPlan 1
>            ->  Seq Scan on host_role_command t1  (cost=0.00..4418.07 rows=1 width=8)
(actual time=16.514..16.514 rows=0 loops=3353)
>                  Filter: (($0 = request_id) AND ((status)::text = ANY ('{QUEUED,IN_PROGRESS,PENDING}'::text[])))
> Total runtime: 55418.123 ms
> (9 rows)
> After: Explain Analyze:
> ambari=> explain analyze SELECT t0.stage_id, t0.cluster_host_info, t0.cluster_id,
t0.log_info, t0.request_context, t0.request_id FROM stage t0, host_role_command t1 WHERE ((t0.request_id
= t1.request_id) AND t0.stage_id =t1.stage_id and (t1.status IN ('QUEUED','IN_PROGRESS','PENDING')))
ORDER BY t0.stage_id,t0.request_id
> ;
>                                                            QUERY PLAN               
                                          
> --------------------------------------------------------------------------------------------------------------------------------
> Sort  (cost=4346.51..4346.52 rows=2 width=894) (actual time=53.605..53.605 rows=0 loops=1)
>    Sort Key: t0.stage_id, t0.request_id
>    Sort Method:  quicksort  Memory: 25kB
>    ->  Nested Loop  (cost=0.00..4346.50 rows=2 width=894) (actual time=53.596..53.596
rows=0 loops=1)
>          ->  Seq Scan on host_role_command t1  (cost=0.00..4338.22 rows=1 width=16)
(actual time=53.595..53.595 rows=0 loops=1)
>                Filter: ((status)::text = ANY ('{QUEUED,IN_PROGRESS,PENDING}'::text[]))
>          ->  Index Scan using stage_pkey on stage t0  (cost=0.00..8.27 rows=1 width=894)
(never executed)
>                Index Cond: ((t0.stage_id = t1.stage_id) AND (t0.request_id = t1.request_id))
> Total runtime: 53.654 ms
> (9 rows)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message