Return-Path: X-Original-To: apmail-ambari-dev-archive@www.apache.org Delivered-To: apmail-ambari-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 4801B18777 for ; Thu, 25 Feb 2016 03:57:20 +0000 (UTC) Received: (qmail 34234 invoked by uid 500); 25 Feb 2016 03:57:20 -0000 Delivered-To: apmail-ambari-dev-archive@ambari.apache.org Received: (qmail 34198 invoked by uid 500); 25 Feb 2016 03:57:20 -0000 Mailing-List: contact dev-help@ambari.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@ambari.apache.org Delivered-To: mailing list dev@ambari.apache.org Received: (qmail 34175 invoked by uid 99); 25 Feb 2016 03:57:19 -0000 Received: from reviews-vm.apache.org (HELO reviews.apache.org) (140.211.11.40) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 25 Feb 2016 03:57:19 +0000 Received: from reviews.apache.org (localhost [127.0.0.1]) by reviews.apache.org (Postfix) with ESMTP id 731912A9D94; Thu, 25 Feb 2016 03:57:19 +0000 (UTC) Content-Type: multipart/alternative; boundary="===============6584772916176887752==" MIME-Version: 1.0 Subject: Re: Review Request 43967: Express Upgrade Stuck At Manual Prompt Due To HRC Status Calculation Cache Problem From: Jonathan Hurley To: Sid Wagle , Sumit Mohanty , Sebastian Toader , Alejandro Fernandez , Nate Cole Cc: Jonathan Hurley , Ambari Date: Thu, 25 Feb 2016 03:57:19 -0000 Message-ID: <20160225035719.23847.72812@reviews.apache.org> X-ReviewBoard-URL: https://reviews.apache.org/ Auto-Submitted: auto-generated Sender: Jonathan Hurley X-ReviewGroup: Ambari X-Auto-Response-Suppress: DR, RN, OOF, AutoReply X-ReviewRequest-URL: https://reviews.apache.org/r/43967/ X-Sender: Jonathan Hurley References: <20160225000758.23847.53820@reviews.apache.org> In-Reply-To: <20160225000758.23847.53820@reviews.apache.org> X-ReviewBoard-Diff-For: ambari-server/src/main/java/org/apache/ambari/server/orm/TransactionalLocks.java X-ReviewBoard-Diff-For: ambari-server/src/main/java/org/apache/ambari/server/orm/TransactionalLockInterceptor.java X-ReviewBoard-Diff-For: ambari-server/src/main/java/org/apache/ambari/annotations/TransactionalLock.java Reply-To: Jonathan Hurley X-ReviewRequest-Repository: ambari --===============6584772916176887752== MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit > On Feb. 24, 2016, 7:07 p.m., Sumit Mohanty wrote: > > ambari-server/src/main/java/org/apache/ambari/server/orm/dao/HostRoleCommandDAO.java, line 16 > > > > > > During RU/EU do we change the state of some HRC - force change to HOLDING or something like that. Will that need cache invalidation? > > Jonathan Hurley wrote: > As far as I can see, all updates to HRC statuses are done through this class and all of the methods have the invalidation on them. If there's one that you see which I don't, please point it out so I can take a look. > > Sumit Mohanty wrote: > I was thinking about when upgrade is paused. But I assume it will end up in the merge/mergeAll call. When the upgrade is paused, all entities are moved into ABORTED, which will be done via the HRC DAO. - Jonathan ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/43967/#review120615 ----------------------------------------------------------- On Feb. 24, 2016, 6:53 p.m., Jonathan Hurley wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/43967/ > ----------------------------------------------------------- > > (Updated Feb. 24, 2016, 6:53 p.m.) > > > Review request for Ambari, Alejandro Fernandez, Nate Cole, Sumit Mohanty, Sebastian Toader, and Sid Wagle. > > > Bugs: AMBARI-15173 > https://issues.apache.org/jira/browse/AMBARI-15173 > > > Repository: ambari > > > Description > ------- > > Seen while performing an upgrade, it's possible that the status of a request/stage does not match that of its tasks. Essentially, the task could be {{HOLDING}} while the request is still {{IN_PROGRESS}}. > > I believe that AMBARI-15011 is responsible for this issue. AMBARI-15011 introduced, among other things, a cache to the {{HostRoleCommandStatusSummaryDTO}} which is a aggregation of the number of tasks a stage has in each state (PENDING, HOLDING, etc). > > This {{HostRoleCommandStatusSummaryDTO}} is used by {{CalculatedState}} to calculate a stage's and request's status based on the tasks. > > The problem is that {{ServerActionExecutor}} is moving a tasks's state to {{HOLDING}} (reflected in the database correctly) but the cache invalidation happens inside the uncommitted transaction. This causes stale data to be re-cached. So, when we go to calculate the request and state status, we get {{IN_PROGRESS}} instead of {{HOLDING}}. > > {code} > { > "href": "http://172.22.72.13:8080/api/v1/clusters/cl1/requests/61/stages/1?fields=*,tasks/*", > "Stage": { > "cluster_name": "cl1", > "context": "Stop YARN Queues", > "display_status": "IN_PROGRESS", > "end_time": -1, > "progress_percent": 35, > "request_id": 61, > "skippable": true, > "stage_id": 1, > "start_time": 1456227329191, > "status": "IN_PROGRESS" > }, > "tasks": [ > { > "href": "http://172.22.72.13:8080/api/v1/clusters/cl1/requests/61/stages/1/tasks/754", > "Tasks": { > "attempt_cnt": 1, > "cluster_name": "cl1", > "command": "EXECUTE", > "command_detail": "Before continuing, please stop all YARN queues. If yarn-site's yarn.resourcemanager.work-preserving-recovery.enabled is set to true, then you can skip this step since the clients will retry on their own.", > "custom_command_name": "org.apache.ambari.server.serveraction.upgrades.ManualStageAction", > "end_time": -1, > "error_log": "errors-754.txt", > "exit_code": 0, > "host_name": "os-r6-mkqzcs-c10tom21unsecha-6.novalocal", > "id": 754, > "output_log": "output-754.txt", > "request_id": 61, > "role": "AMBARI_SERVER_ACTION", > "stage_id": 1, > "start_time": 1456227329191, > "status": "HOLDING", > "stderr": "", > "stdout": "", > "structured_out": {} > } > } > ] > } > {code} > > > Diffs > ----- > > ambari-server/src/main/java/com/google/inject/persist/jpa/AmbariJpaPersistModule.java 4e4dd35 > ambari-server/src/main/java/org/apache/ambari/annotations/TransactionalLock.java PRE-CREATION > ambari-server/src/main/java/org/apache/ambari/server/orm/AmbariJpaLocalTxnInterceptor.java 6d7901c > ambari-server/src/main/java/org/apache/ambari/server/orm/TransactionalLockInterceptor.java PRE-CREATION > ambari-server/src/main/java/org/apache/ambari/server/orm/TransactionalLocks.java PRE-CREATION > ambari-server/src/main/java/org/apache/ambari/server/orm/dao/HostRoleCommandDAO.java deca9b1 > ambari-server/src/main/resources/stacks/HDP/2.3/upgrades/upgrade-2.4.xml 29ebc1f > > Diff: https://reviews.apache.org/r/43967/diff/ > > > Testing > ------- > > Pending unit tests... > > > Thanks, > > Jonathan Hurley > > --===============6584772916176887752==--