ambari-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Hurley (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (AMBARI-9334) Ambari StageDAO.findByCommandStatuses causes Postgress HIGH CPU
Date Wed, 04 Feb 2015 16:20:35 GMT

    [ https://issues.apache.org/jira/browse/AMBARI-9334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14305425#comment-14305425
] 

Jonathan Hurley commented on AMBARI-9334:
-----------------------------------------

JPQL doesn't allow DISTINCT where the entity has a LOB nor does it allow subqueries in a JOIN;
both would have also fixed this problem. The original query did work; there were specific
cases with your new query that did not work (I mentioned those above).

Although MySQL might support DISTINCT directly on LOB, JPA prohibits it because other databases
either don't support it or don't support it correctly. 

> Ambari StageDAO.findByCommandStatuses causes Postgress HIGH CPU
> ---------------------------------------------------------------
>
>                 Key: AMBARI-9334
>                 URL: https://issues.apache.org/jira/browse/AMBARI-9334
>             Project: Ambari
>          Issue Type: Bug
>          Components: ambari-server
>    Affects Versions: 1.6.0, 1.6.1, 1.7.0
>         Environment: RHEL 6.4/6.5
> postgresql-server-8.4.13-1.el6_3.x86_64
> postgresql-8.4.13-1.el6_3.x86_64
> postgresql-libs-8.4.13-1.el6_3.x86_64
> ambari-agent-1.6.1-98.x86_64
> ambari-log4j-1.6.1.98-1.noarch
> ambari-server-1.6.1-98.noarch
>            Reporter: Greg Senia
>            Assignee: Jonathan Hurley
>            Priority: Critical
>             Fix For: 2.0.0
>
>         Attachments: AMBARI-9334.patch, AMBARI-9334.patch.2, Screen Shot 2015-01-26 at
12.18.56 PM.png, Screen Shot 2015-01-26 at 12.19.26 PM.png, StageDAO-1.6.1.patch, StageDAO.java
>
>
> The following code that generates a query causes postgres to use lots of CPU espcially
if the Ambari DB grows over time. We reduced CPU by 30-40% by fixing the code below.
> Before:
> public List<StageEntity> findByCommandStatuses(Collection<HostRoleStatus>
statuses) {
> TypedQuery<StageEntity> query = entityManagerProvider.get().createQuery("SELECT
stage " +
> "FROM StageEntity stage WHERE stage.stageId IN (SELECT hrce.stageId FROM " +
> "HostRoleCommandEntity hrce WHERE stage.requestId = hrce.requestId and hrce.status IN
?1 ) " +
> "ORDER BY stage.requestId, stage.stageId", StageEntity.class);
> return daoUtils.selectList(query, statuses);
> }
> After:
> @RequiresSession
> public List<StageEntity> findByCommandStatuses(Collection<HostRoleStatus>
statuses) {
> TypedQuery<StageEntity> query = entityManagerProvider.get().createQuery("SELECT
stage "+
> "FROM StageEntity stage, HostRoleCommandEntity hrce " +
> "WHERE stage.requestId = hrce.requestId AND stage.stageId = hrce.stageId and hrce.status
IN ?1 " +
> "ORDER BY stage.requestId, stage.stageId", StageEntity.class);
> return daoUtils.selectList(query, statuses);
> }
> Before EXPLAIN ANALYZE:
> ambari=> explain analyze SELECT t0.stage_id, t0.cluster_host_info, t0.cluster_id,
t0.log_info, t0.request_context, t0.request_id FROM stage t0 WHERE t0.stage_id IN (SELECT
t1.stage_id FROM host_role_command t1 WHERE ((t0.request_id = t1.request_id) AND (t1.status
IN ('QUEUED','IN_PROGRESS','PENDING')))) ORDER BY t0.request_id, t0.stage_id
> ;
>                                                              QUERY PLAN             
                                              
> ------------------------------------------------------------------------------------------------------------------------------------
> Sort  (cost=7407488.50..7407492.69 rows=1676 width=894) (actual time=55418.086..55418.086
rows=0 loops=1)
>    Sort Key: t0.request_id, t0.stage_id
>    Sort Method:  quicksort  Memory: 25kB
>    ->  Seq Scan on stage t0  (cost=0.00..7407398.75 rows=1676 width=894) (actual time=55418.081..55418.081
rows=0 loops=1)
>          Filter: (SubPlan 1)
>          SubPlan 1
>            ->  Seq Scan on host_role_command t1  (cost=0.00..4418.07 rows=1 width=8)
(actual time=16.514..16.514 rows=0 loops=3353)
>                  Filter: (($0 = request_id) AND ((status)::text = ANY ('{QUEUED,IN_PROGRESS,PENDING}'::text[])))
> Total runtime: 55418.123 ms
> (9 rows)
> After: Explain Analyze:
> ambari=> explain analyze SELECT t0.stage_id, t0.cluster_host_info, t0.cluster_id,
t0.log_info, t0.request_context, t0.request_id FROM stage t0, host_role_command t1 WHERE ((t0.request_id
= t1.request_id) AND t0.stage_id =t1.stage_id and (t1.status IN ('QUEUED','IN_PROGRESS','PENDING')))
ORDER BY t0.stage_id,t0.request_id
> ;
>                                                            QUERY PLAN               
                                          
> --------------------------------------------------------------------------------------------------------------------------------
> Sort  (cost=4346.51..4346.52 rows=2 width=894) (actual time=53.605..53.605 rows=0 loops=1)
>    Sort Key: t0.stage_id, t0.request_id
>    Sort Method:  quicksort  Memory: 25kB
>    ->  Nested Loop  (cost=0.00..4346.50 rows=2 width=894) (actual time=53.596..53.596
rows=0 loops=1)
>          ->  Seq Scan on host_role_command t1  (cost=0.00..4338.22 rows=1 width=16)
(actual time=53.595..53.595 rows=0 loops=1)
>                Filter: ((status)::text = ANY ('{QUEUED,IN_PROGRESS,PENDING}'::text[]))
>          ->  Index Scan using stage_pkey on stage t0  (cost=0.00..8.27 rows=1 width=894)
(never executed)
>                Index Cond: ((t0.stage_id = t1.stage_id) AND (t0.request_id = t1.request_id))
> Total runtime: 53.654 ms
> (9 rows)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message