ambari-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hudson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (AMBARI-10050) Querying For Requests By Task Status Has Poor Performance
Date Thu, 12 Mar 2015 22:25:38 GMT

    [ https://issues.apache.org/jira/browse/AMBARI-10050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14359545#comment-14359545
] 

Hudson commented on AMBARI-10050:
---------------------------------

SUCCESS: Integrated in Ambari-trunk-Commit #2018 (See [https://builds.apache.org/job/Ambari-trunk-Commit/2018/])
AMBARI-10050 - Querying For Requests By Task Status Has Poor Performance (jonathanhurley)
(jhurley: http://git-wip-us.apache.org/repos/asf?p=ambari.git&a=commit&h=ad8536b9561060529e74aa5721d80e28b492724e)
* ambari-server/src/main/java/org/apache/ambari/server/orm/dao/HostRoleCommandDAO.java
* ambari-server/src/main/java/org/apache/ambari/server/actionmanager/HostRoleStatus.java
* ambari-server/src/main/java/org/apache/ambari/server/state/cluster/ClustersImpl.java
* ambari-server/src/test/java/org/apache/ambari/server/orm/dao/RequestDAOTest.java
* ambari-server/src/main/java/org/apache/ambari/server/orm/dao/RequestDAO.java
* ambari-server/src/main/java/org/apache/ambari/server/controller/AmbariServer.java
* ambari-server/src/main/java/org/apache/ambari/server/actionmanager/ActionDBAccessorImpl.java


> Querying For Requests By Task Status Has Poor Performance
> ---------------------------------------------------------
>
>                 Key: AMBARI-10050
>                 URL: https://issues.apache.org/jira/browse/AMBARI-10050
>             Project: Ambari
>          Issue Type: Bug
>          Components: ambari-server
>    Affects Versions: 2.0.0
>            Reporter: Jonathan Hurley
>            Assignee: Jonathan Hurley
>            Priority: Critical
>             Fix For: 2.0.0
>
>         Attachments: AMBARI-10050.patch
>
>
> When querying for the requests that are either IN_PROGRESS, FAILED or COMPLETED, the
query being used is inefficient and can cause a wait of up to 10 minutes in a cluster where
there is a large number of stages and tasks.
> {{HostRoleCommandDAO.getRequestsByTaskStatus(...)}}
> This SQL seems overly complex for what it is. Removing the nested SELECT seems like a
great way to reduce the query time:
> {code}
> SELECT DISTINCT task.request_id as request_id
>   FROM host_role_command task WHERE task.status IN ( 'COMPLETED', 'FAILED', 'TIMEDOUT',
'ABORTED')
>   ORDER BY task.request_id ASC;
> {code}
> ... or if we want to keep the NOT IN
> {code}
> SELECT DISTINCT task.request_id as task_id
>   FROM host_role_command task WHERE task.status NOT IN ( 'QUEUED', 'IN_PROGRESS', 
>                                               'PENDING', 'HOLDING', 
>                                               'HOLDING_FAILED', 
>                                               'HOLDING_TIMEOUT' )
>   ORDER BY task.request_id ASC
>   LIMIT 1000
> {code}
> But to be honest, my suggestion is to rewrite this as a simple query that matches from
an {{EnumSet}} found in {{HostRoleStatus}}.
> Essentially, the problem here is that {{getRequestsByStatus}} is trying to do the calculation
work to determine the request status from task status all in SQL. This method should be broken
out into 2 SQL queries:
> - My above query for IN_PROGRESS or FAILED requests
> - A new query for COMPLETED that looks for any requests where all tasks have completed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message