ambari-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hadoop QA (JIRA)" <>
Subject [jira] [Commented] (AMBARI-10050) Querying For Requests By Task Status Has Poor Performance
Date Thu, 12 Mar 2015 20:11:38 GMT


Hadoop QA commented on AMBARI-10050:

{color:red}-1 overall{color}.  Here are the results of testing the latest attachment
  against trunk revision .

    {color:green}+1 @author{color}.  The patch does not contain any @author tags.

    {color:red}-1 tests included{color}.  The patch doesn't appear to include any new or modified
                        Please justify why no new tests are needed for this patch.
                        Also please list what manual steps were performed to verify this patch.

    {color:green}+1 javac{color}.  The applied patch does not increase the total number of
javac compiler warnings.

    {color:green}+1 release audit{color}.  The applied patch does not increase the total number
of release audit warnings.

    {color:red}-1 core tests{color}.  The test build failed in ambari-server 

Test results:
Console output:

This message is automatically generated.

> Querying For Requests By Task Status Has Poor Performance
> ---------------------------------------------------------
>                 Key: AMBARI-10050
>                 URL:
>             Project: Ambari
>          Issue Type: Bug
>          Components: ambari-server
>    Affects Versions: 2.0.0
>            Reporter: Jonathan Hurley
>            Assignee: Jonathan Hurley
>            Priority: Critical
>             Fix For: 2.0.0
>         Attachments: AMBARI-10050.patch
> When querying for the requests that are either IN_PROGRESS, FAILED or COMPLETED, the
query being used is inefficient and can cause a wait of up to 10 minutes in a cluster where
there is a large number of stages and tasks.
> {{HostRoleCommandDAO.getRequestsByTaskStatus(...)}}
> This SQL seems overly complex for what it is. Removing the nested SELECT seems like a
great way to reduce the query time:
> {code}
> SELECT DISTINCT task.request_id as request_id
>   FROM host_role_command task WHERE task.status IN ( 'COMPLETED', 'FAILED', 'TIMEDOUT',
>   ORDER BY task.request_id ASC;
> {code}
> ... or if we want to keep the NOT IN
> {code}
> SELECT DISTINCT task.request_id as task_id
>   FROM host_role_command task WHERE task.status NOT IN ( 'QUEUED', 'IN_PROGRESS', 
>                                               'PENDING', 'HOLDING', 
>                                               'HOLDING_FAILED', 
>                                               'HOLDING_TIMEOUT' )
>   ORDER BY task.request_id ASC
>   LIMIT 1000
> {code}
> But to be honest, my suggestion is to rewrite this as a simple query that matches from
an {{EnumSet}} found in {{HostRoleStatus}}.
> Essentially, the problem here is that {{getRequestsByStatus}} is trying to do the calculation
work to determine the request status from task status all in SQL. This method should be broken
out into 2 SQL queries:
> - My above query for IN_PROGRESS or FAILED requests
> - A new query for COMPLETED that looks for any requests where all tasks have completed.

This message was sent by Atlassian JIRA

View raw message