ambari-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Hurley <jhur...@hortonworks.com>
Subject Re: Review Request 43967: Express Upgrade Stuck At Manual Prompt Due To HRC Status Calculation Cache Problem
Date Fri, 26 Feb 2016 17:34:50 GMT


> On Feb. 26, 2016, 12:29 p.m., Nate Cole wrote:
> > ambari-server/src/main/java/org/apache/ambari/server/orm/AmbariJpaLocalTxnInterceptor.java,
lines 76-78
> > <https://reviews.apache.org/r/43967/diff/5/?file=1272172#file1272172line76>
> >
> >     I'm assuming it's a Set for nested @TransactionactionalLock's?  default equals()/hashCode()
sufficient?

Yes, according to the javadoc for the Annotation base class, it is.
https://docs.oracle.com/javase/7/docs/api/java/lang/annotation/Annotation.html#equals(java.lang.Object)

I'm adding tests for this as well to ensure it works the way we think.


> On Feb. 26, 2016, 12:29 p.m., Nate Cole wrote:
> > ambari-server/src/main/java/org/apache/ambari/server/orm/AmbariJpaLocalTxnInterceptor.java,
lines 284-287
> > <https://reviews.apache.org/r/43967/diff/5/?file=1272172#file1272172line284>
> >
> >     If we end up in a nested call here, can this ever be the same lock instances,
and if so will calling lock() repeatedly work?

We can and I think it's OK. Because we're using a ReentrantLock here, we can call lock() as
many times as we want and it will increase the counters in the lock. Does that answer your
question?

My unit test should also be able to test this by invoking a method with a nested Transactional
and ensuring that there's no deadlock waiting.


> On Feb. 26, 2016, 12:29 p.m., Nate Cole wrote:
> > ambari-server/src/main/java/org/apache/ambari/server/orm/AmbariJpaLocalTxnInterceptor.java,
lines 308-310
> > <https://reviews.apache.org/r/43967/diff/5/?file=1272172#file1272172line308>
> >
> >     If nested, will these unlock in the same order they were inserted?  Maybe consider
a LinkedHashSet to preserve it and iterate backwards such that you unlock the last one locked?

Interesting use case; Right now the annotation only allows for a single LockArea, but what
if you hit foo() with LockArea.ONE and bar() with LockArea.TWO; yes, they should probably
be done in reverse. I'll make this change and re-submit.


- Jonathan


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/43967/#review120890
-----------------------------------------------------------


On Feb. 26, 2016, 12:22 p.m., Jonathan Hurley wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/43967/
> -----------------------------------------------------------
> 
> (Updated Feb. 26, 2016, 12:22 p.m.)
> 
> 
> Review request for Ambari, Alejandro Fernandez, Nate Cole, Sumit Mohanty, Sebastian Toader,
and Sid Wagle.
> 
> 
> Bugs: AMBARI-15173
>     https://issues.apache.org/jira/browse/AMBARI-15173
> 
> 
> Repository: ambari
> 
> 
> Description
> -------
> 
> Seen while performing an upgrade, it's possible that the status of a request/stage does
not match that of its tasks. Essentially, the task could be {{HOLDING}} while the request
is still {{IN_PROGRESS}}.
> 
> I believe that AMBARI-15011 is responsible for this issue. AMBARI-15011 introduced, among
other things, a cache to the {{HostRoleCommandStatusSummaryDTO}} which is a aggregation of
the number of tasks a stage has in each state (PENDING, HOLDING, etc).
> 
> This {{HostRoleCommandStatusSummaryDTO}} is used by {{CalculatedState}} to calculate
a stage's and request's status based on the tasks. 
> 
> The problem is that {{ServerActionExecutor}} is moving a tasks's state to {{HOLDING}}
(reflected in the database correctly) but the cache invalidation happens inside the uncommitted
transaction. This causes stale data to be re-cached. So, when we go to calculate the request
and state status, we get {{IN_PROGRESS}} instead of {{HOLDING}}.
> 
> {code}
> {
>   "href": "http://172.22.72.13:8080/api/v1/clusters/cl1/requests/61/stages/1?fields=*,tasks/*",
>   "Stage": {
>     "cluster_name": "cl1",
>     "context": "Stop YARN Queues",
>     "display_status": "IN_PROGRESS",
>     "end_time": -1,
>     "progress_percent": 35,
>     "request_id": 61,
>     "skippable": true,
>     "stage_id": 1,
>     "start_time": 1456227329191,
>     "status": "IN_PROGRESS"
>   },
>   "tasks": [
>     {
>       "href": "http://172.22.72.13:8080/api/v1/clusters/cl1/requests/61/stages/1/tasks/754",
>       "Tasks": {
>         "attempt_cnt": 1,
>         "cluster_name": "cl1",
>         "command": "EXECUTE",
>         "command_detail": "Before continuing, please stop all YARN queues. If yarn-site's
yarn.resourcemanager.work-preserving-recovery.enabled is set to true, then you can skip this
step since the clients will retry on their own.",
>         "custom_command_name": "org.apache.ambari.server.serveraction.upgrades.ManualStageAction",
>         "end_time": -1,
>         "error_log": "errors-754.txt",
>         "exit_code": 0,
>         "host_name": "os-r6-mkqzcs-c10tom21unsecha-6.novalocal",
>         "id": 754,
>         "output_log": "output-754.txt",
>         "request_id": 61,
>         "role": "AMBARI_SERVER_ACTION",
>         "stage_id": 1,
>         "start_time": 1456227329191,
>         "status": "HOLDING",
>         "stderr": "",
>         "stdout": "",
>         "structured_out": {}
>       }
>     }
>   ]
> }
> {code}
> 
> 
> Diffs
> -----
> 
>   ambari-server/src/main/java/com/google/inject/persist/jpa/AmbariJpaPersistModule.java
604546c 
>   ambari-server/src/main/java/org/apache/ambari/server/actionmanager/ActionDBAccessor.java
7f69a31 
>   ambari-server/src/main/java/org/apache/ambari/server/actionmanager/ActionDBAccessorImpl.java
3f4ffeb 
>   ambari-server/src/main/java/org/apache/ambari/server/orm/AmbariJpaLocalTxnInterceptor.java
3c953ca 
>   ambari-server/src/main/java/org/apache/ambari/server/orm/TransactionalLockInterceptor.java
0cf73cb 
> 
> Diff: https://reviews.apache.org/r/43967/diff/
> 
> 
> Testing
> -------
> 
> Pending unit tests...
> 
> 
> Thanks,
> 
> Jonathan Hurley
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message