ambari-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Hurley <jhur...@hortonworks.com>
Subject Re: Review Request 43967: Express Upgrade Stuck At Manual Prompt Due To HRC Status Calculation Cache Problem
Date Thu, 25 Feb 2016 03:55:44 GMT


> On Feb. 24, 2016, 7:22 p.m., Sid Wagle wrote:
> > Although I like the pattern here because it is re-usable, isn't this caused by a
missing Readlock on the cache re-load? Could this not have been handled correctly with application
level read/write locks? The reader which loads the cache needs to wait on the writeLock and
the Transaction boundary is within the write lock to gurantee the transaction has committed
already.
> > This does achieve the end result my only concern is introduciton of another locking
pattern and then un-intentional or mis-guided re-use by someone else.

Thanks for the review, Sid. The problem is actually caused because of how the @Transactional
method wraps around the DAO method. Consider this case:

```
@Transactional
public void foo(Entity entity){
  EntityManager.merge(entity)
  cache.invalidate(entity)
}
```

On the surface, this looks fine. But what's actually happening is that the cache is being
invalidated before the transaction is committed. Now what happens if a read comes in after
the cache invalidation, but before the @Transactional interceptor is able to commit the transaction?
The cache will get the old value read from the database and cached as new, which causes the
problem.

Even if I were to introduce a read lock in the cache, the above `foo()` method would have
to release its wrote lock before exiting. And that still happens _before_ the transaction
is committed. So, anything waiting on the read lock would then be notified to start work,
effectively reading a non-committed transaction. It's the same problem.


- Jonathan


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/43967/#review120624
-----------------------------------------------------------


On Feb. 24, 2016, 6:53 p.m., Jonathan Hurley wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/43967/
> -----------------------------------------------------------
> 
> (Updated Feb. 24, 2016, 6:53 p.m.)
> 
> 
> Review request for Ambari, Alejandro Fernandez, Nate Cole, Sumit Mohanty, Sebastian Toader,
and Sid Wagle.
> 
> 
> Bugs: AMBARI-15173
>     https://issues.apache.org/jira/browse/AMBARI-15173
> 
> 
> Repository: ambari
> 
> 
> Description
> -------
> 
> Seen while performing an upgrade, it's possible that the status of a request/stage does
not match that of its tasks. Essentially, the task could be {{HOLDING}} while the request
is still {{IN_PROGRESS}}.
> 
> I believe that AMBARI-15011 is responsible for this issue. AMBARI-15011 introduced, among
other things, a cache to the {{HostRoleCommandStatusSummaryDTO}} which is a aggregation of
the number of tasks a stage has in each state (PENDING, HOLDING, etc).
> 
> This {{HostRoleCommandStatusSummaryDTO}} is used by {{CalculatedState}} to calculate
a stage's and request's status based on the tasks. 
> 
> The problem is that {{ServerActionExecutor}} is moving a tasks's state to {{HOLDING}}
(reflected in the database correctly) but the cache invalidation happens inside the uncommitted
transaction. This causes stale data to be re-cached. So, when we go to calculate the request
and state status, we get {{IN_PROGRESS}} instead of {{HOLDING}}.
> 
> {code}
> {
>   "href": "http://172.22.72.13:8080/api/v1/clusters/cl1/requests/61/stages/1?fields=*,tasks/*",
>   "Stage": {
>     "cluster_name": "cl1",
>     "context": "Stop YARN Queues",
>     "display_status": "IN_PROGRESS",
>     "end_time": -1,
>     "progress_percent": 35,
>     "request_id": 61,
>     "skippable": true,
>     "stage_id": 1,
>     "start_time": 1456227329191,
>     "status": "IN_PROGRESS"
>   },
>   "tasks": [
>     {
>       "href": "http://172.22.72.13:8080/api/v1/clusters/cl1/requests/61/stages/1/tasks/754",
>       "Tasks": {
>         "attempt_cnt": 1,
>         "cluster_name": "cl1",
>         "command": "EXECUTE",
>         "command_detail": "Before continuing, please stop all YARN queues. If yarn-site's
yarn.resourcemanager.work-preserving-recovery.enabled is set to true, then you can skip this
step since the clients will retry on their own.",
>         "custom_command_name": "org.apache.ambari.server.serveraction.upgrades.ManualStageAction",
>         "end_time": -1,
>         "error_log": "errors-754.txt",
>         "exit_code": 0,
>         "host_name": "os-r6-mkqzcs-c10tom21unsecha-6.novalocal",
>         "id": 754,
>         "output_log": "output-754.txt",
>         "request_id": 61,
>         "role": "AMBARI_SERVER_ACTION",
>         "stage_id": 1,
>         "start_time": 1456227329191,
>         "status": "HOLDING",
>         "stderr": "",
>         "stdout": "",
>         "structured_out": {}
>       }
>     }
>   ]
> }
> {code}
> 
> 
> Diffs
> -----
> 
>   ambari-server/src/main/java/com/google/inject/persist/jpa/AmbariJpaPersistModule.java
4e4dd35 
>   ambari-server/src/main/java/org/apache/ambari/annotations/TransactionalLock.java PRE-CREATION

>   ambari-server/src/main/java/org/apache/ambari/server/orm/AmbariJpaLocalTxnInterceptor.java
6d7901c 
>   ambari-server/src/main/java/org/apache/ambari/server/orm/TransactionalLockInterceptor.java
PRE-CREATION 
>   ambari-server/src/main/java/org/apache/ambari/server/orm/TransactionalLocks.java PRE-CREATION

>   ambari-server/src/main/java/org/apache/ambari/server/orm/dao/HostRoleCommandDAO.java
deca9b1 
>   ambari-server/src/main/resources/stacks/HDP/2.3/upgrades/upgrade-2.4.xml 29ebc1f 
> 
> Diff: https://reviews.apache.org/r/43967/diff/
> 
> 
> Testing
> -------
> 
> Pending unit tests...
> 
> 
> Thanks,
> 
> Jonathan Hurley
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message