ambari-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sebastian Toader <stoa...@hortonworks.com>
Subject Re: Review Request 43967: Express Upgrade Stuck At Manual Prompt Due To HRC Status Calculation Cache Problem
Date Thu, 25 Feb 2016 14:01:58 GMT


> On Feb. 25, 2016, 9:21 a.m., Sebastian Toader wrote:
> > ambari-server/src/main/java/org/apache/ambari/server/orm/AmbariJpaLocalTxnInterceptor.java,
line 107
> > <https://reviews.apache.org/r/43967/diff/2/?file=1268514#file1268514line107>
> >
> >     Shouldn't be transactional locks released in the finally blocks here? 
> >     
> >     The TransactionalLockInterceptor will release the lock when the method annotated
with TransactionalLock exits. However the current running transaction may not be commited
when the TransactionalLockInterceptor exits. The lifespan of the transaction is controlled
by AmbariJpaLocalTxnInterceptor.invoke() method so here is the place where we definitely know
when the transaction is commited/rolledback.
> 
> Jonathan Hurley wrote:
>     The method interceptors run in the order in which they were bound. The TransactionalLockInterceptor
is bound first, so it will run first and invoke Joinpoint.proceed(). So basically, this is
the ordering:
>     
>     TransactionalLockInterceptor 
>       lock
>       Joinpoint.proceed() -> Transactional
>         begin transaction
>         do stuff
>         end transaction
>       release lock

Thanks for the clarification.


- Sebastian


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/43967/#review120675
-----------------------------------------------------------


On Feb. 25, 2016, 12:53 a.m., Jonathan Hurley wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/43967/
> -----------------------------------------------------------
> 
> (Updated Feb. 25, 2016, 12:53 a.m.)
> 
> 
> Review request for Ambari, Alejandro Fernandez, Nate Cole, Sumit Mohanty, Sebastian Toader,
and Sid Wagle.
> 
> 
> Bugs: AMBARI-15173
>     https://issues.apache.org/jira/browse/AMBARI-15173
> 
> 
> Repository: ambari
> 
> 
> Description
> -------
> 
> Seen while performing an upgrade, it's possible that the status of a request/stage does
not match that of its tasks. Essentially, the task could be {{HOLDING}} while the request
is still {{IN_PROGRESS}}.
> 
> I believe that AMBARI-15011 is responsible for this issue. AMBARI-15011 introduced, among
other things, a cache to the {{HostRoleCommandStatusSummaryDTO}} which is a aggregation of
the number of tasks a stage has in each state (PENDING, HOLDING, etc).
> 
> This {{HostRoleCommandStatusSummaryDTO}} is used by {{CalculatedState}} to calculate
a stage's and request's status based on the tasks. 
> 
> The problem is that {{ServerActionExecutor}} is moving a tasks's state to {{HOLDING}}
(reflected in the database correctly) but the cache invalidation happens inside the uncommitted
transaction. This causes stale data to be re-cached. So, when we go to calculate the request
and state status, we get {{IN_PROGRESS}} instead of {{HOLDING}}.
> 
> {code}
> {
>   "href": "http://172.22.72.13:8080/api/v1/clusters/cl1/requests/61/stages/1?fields=*,tasks/*",
>   "Stage": {
>     "cluster_name": "cl1",
>     "context": "Stop YARN Queues",
>     "display_status": "IN_PROGRESS",
>     "end_time": -1,
>     "progress_percent": 35,
>     "request_id": 61,
>     "skippable": true,
>     "stage_id": 1,
>     "start_time": 1456227329191,
>     "status": "IN_PROGRESS"
>   },
>   "tasks": [
>     {
>       "href": "http://172.22.72.13:8080/api/v1/clusters/cl1/requests/61/stages/1/tasks/754",
>       "Tasks": {
>         "attempt_cnt": 1,
>         "cluster_name": "cl1",
>         "command": "EXECUTE",
>         "command_detail": "Before continuing, please stop all YARN queues. If yarn-site's
yarn.resourcemanager.work-preserving-recovery.enabled is set to true, then you can skip this
step since the clients will retry on their own.",
>         "custom_command_name": "org.apache.ambari.server.serveraction.upgrades.ManualStageAction",
>         "end_time": -1,
>         "error_log": "errors-754.txt",
>         "exit_code": 0,
>         "host_name": "os-r6-mkqzcs-c10tom21unsecha-6.novalocal",
>         "id": 754,
>         "output_log": "output-754.txt",
>         "request_id": 61,
>         "role": "AMBARI_SERVER_ACTION",
>         "stage_id": 1,
>         "start_time": 1456227329191,
>         "status": "HOLDING",
>         "stderr": "",
>         "stdout": "",
>         "structured_out": {}
>       }
>     }
>   ]
> }
> {code}
> 
> 
> Diffs
> -----
> 
>   ambari-server/src/main/java/com/google/inject/persist/jpa/AmbariJpaPersistModule.java
4e4dd35 
>   ambari-server/src/main/java/org/apache/ambari/annotations/TransactionalLock.java PRE-CREATION

>   ambari-server/src/main/java/org/apache/ambari/server/orm/AmbariJpaLocalTxnInterceptor.java
6d7901c 
>   ambari-server/src/main/java/org/apache/ambari/server/orm/TransactionalLockInterceptor.java
PRE-CREATION 
>   ambari-server/src/main/java/org/apache/ambari/server/orm/TransactionalLocks.java PRE-CREATION

>   ambari-server/src/main/java/org/apache/ambari/server/orm/dao/HostRoleCommandDAO.java
deca9b1 
>   ambari-server/src/main/resources/stacks/HDP/2.3/upgrades/upgrade-2.4.xml 29ebc1f 
> 
> Diff: https://reviews.apache.org/r/43967/diff/
> 
> 
> Testing
> -------
> 
> Pending unit tests...
> 
> 
> Thanks,
> 
> Jonathan Hurley
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message