ambari-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Hurley" <jhur...@hortonworks.com>
Subject Re: Review Request 40494: Retreiving Failed Service Checks Takes Too Long On Large Clusters
Date Thu, 19 Nov 2015 19:17:43 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/40494/
-----------------------------------------------------------

(Updated Nov. 19, 2015, 2:17 p.m.)


Review request for Ambari, Alejandro Fernandez, Jayush Luniya, Nate Cole, Sumit Mohanty, and
Sid Wagle.


Bugs: AMBARI-13974
    https://issues.apache.org/jira/browse/AMBARI-13974


Repository: ambari


Description
-------

STR:
- Launch Rolling Upgrade on big cluster (500+ node)

This call fails due to timeout. No failed Service Checks shown to user.

{code}
/api/v1/clusters/c500/upgrades/69/upgrade_groups?upgrade_items/UpgradeItem/status=COMPLETED&upgrade_items/tasks/Tasks/status.in(FAILED,ABORTED,TIMEDOUT)&upgrade_items/tasks/Tasks/command=SERVICE_CHECK&fields=upgrade_items/tasks/Tasks/command_detail,upgrade_items/tasks/Tasks/status&minimal_response=true
{code}

The root of the problem is how the REST API handles subqueries. For every group that matches,
it will attempt to retrieve every stage and every task and then produce a slice of results
from in-memory comparison.

This should really go through the JPA layer since it's simple comparisons on DB fields.


Diffs (updated)
-----

  ambari-server/src/main/java/org/apache/ambari/server/actionmanager/ActionDBAccessor.java
873261f 
  ambari-server/src/main/java/org/apache/ambari/server/actionmanager/ActionDBAccessorImpl.java
5646156 
  ambari-server/src/main/java/org/apache/ambari/server/actionmanager/ActionManager.java f168ac6

  ambari-server/src/main/java/org/apache/ambari/server/actionmanager/HostRoleCommand.java
cd2e528 
  ambari-server/src/main/java/org/apache/ambari/server/controller/AmbariManagementController.java
0eef06c 
  ambari-server/src/main/java/org/apache/ambari/server/controller/AmbariManagementControllerImpl.java
2001a7d 
  ambari-server/src/main/java/org/apache/ambari/server/controller/TaskStatusRequest.java c966e7f

  ambari-server/src/main/java/org/apache/ambari/server/controller/TaskStatusResponse.java
892b1c3 
  ambari-server/src/main/java/org/apache/ambari/server/controller/internal/TaskResourceProvider.java
1806b78 
  ambari-server/src/main/java/org/apache/ambari/server/orm/dao/HostRoleCommandDAO.java 5db8c42

  ambari-server/src/main/java/org/apache/ambari/server/orm/entities/HostEntity_.java PRE-CREATION

  ambari-server/src/main/java/org/apache/ambari/server/orm/entities/HostRoleCommandEntity_.java
4dad21a 
  ambari-server/src/test/java/org/apache/ambari/server/controller/AmbariManagementControllerTest.java
4a80c4f 
  ambari-server/src/test/java/org/apache/ambari/server/controller/internal/AbstractResourceProviderTest.java
45ab2df 
  ambari-server/src/test/java/org/apache/ambari/server/controller/internal/TaskResourceProviderTest.java
6feb2cc 

Diff: https://reviews.apache.org/r/40494/diff/


Testing
-------

Running a full suite of tests now plus cluster installation and upgrade sanity checks.


Thanks,

Jonathan Hurley


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message