ambari-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Hurley <>
Subject Re: Review Request 43967: Express Upgrade Stuck At Manual Prompt Due To HRC Status Calculation Cache Problem
Date Fri, 26 Feb 2016 19:09:48 GMT

This is an automatically generated e-mail. To reply, visit:

(Updated Feb. 26, 2016, 2:09 p.m.)

Review request for Ambari, Alejandro Fernandez, Nate Cole, Sumit Mohanty, Sebastian Toader,
and Sid Wagle.


Latest update with comments applied and tests.

The unit tests caught a problem with me using HashSet; we don't want to squash multiple calls
to lock in a single entry in a set. Instead, use a list to track 1:1 the calls and then use
a descending iterator to walk back up the chain unlocking.

Bugs: AMBARI-15173

Repository: ambari


Seen while performing an upgrade, it's possible that the status of a request/stage does not
match that of its tasks. Essentially, the task could be {{HOLDING}} while the request is still

I believe that AMBARI-15011 is responsible for this issue. AMBARI-15011 introduced, among
other things, a cache to the {{HostRoleCommandStatusSummaryDTO}} which is a aggregation of
the number of tasks a stage has in each state (PENDING, HOLDING, etc).

This {{HostRoleCommandStatusSummaryDTO}} is used by {{CalculatedState}} to calculate a stage's
and request's status based on the tasks. 

The problem is that {{ServerActionExecutor}} is moving a tasks's state to {{HOLDING}} (reflected
in the database correctly) but the cache invalidation happens inside the uncommitted transaction.
This causes stale data to be re-cached. So, when we go to calculate the request and state
status, we get {{IN_PROGRESS}} instead of {{HOLDING}}.

  "href": "*,tasks/*",
  "Stage": {
    "cluster_name": "cl1",
    "context": "Stop YARN Queues",
    "display_status": "IN_PROGRESS",
    "end_time": -1,
    "progress_percent": 35,
    "request_id": 61,
    "skippable": true,
    "stage_id": 1,
    "start_time": 1456227329191,
    "status": "IN_PROGRESS"
  "tasks": [
      "href": "",
      "Tasks": {
        "attempt_cnt": 1,
        "cluster_name": "cl1",
        "command": "EXECUTE",
        "command_detail": "Before continuing, please stop all YARN queues. If yarn-site's is set to true, then you can skip this
step since the clients will retry on their own.",
        "custom_command_name": "org.apache.ambari.server.serveraction.upgrades.ManualStageAction",
        "end_time": -1,
        "error_log": "errors-754.txt",
        "exit_code": 0,
        "host_name": "os-r6-mkqzcs-c10tom21unsecha-6.novalocal",
        "id": 754,
        "output_log": "output-754.txt",
        "request_id": 61,
        "role": "AMBARI_SERVER_ACTION",
        "stage_id": 1,
        "start_time": 1456227329191,
        "status": "HOLDING",
        "stderr": "",
        "stdout": "",
        "structured_out": {}

Diffs (updated)

  ambari-server/src/main/java/com/google/inject/persist/jpa/ 604546c

  ambari-server/src/test/java/org/apache/ambari/annotations/ fbaa343



Pending unit tests...


Jonathan Hurley

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message