Return-Path: X-Original-To: apmail-ambari-dev-archive@www.apache.org Delivered-To: apmail-ambari-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 0DA1218B2D for ; Tue, 26 May 2015 19:33:18 +0000 (UTC) Received: (qmail 27783 invoked by uid 500); 26 May 2015 19:33:17 -0000 Delivered-To: apmail-ambari-dev-archive@ambari.apache.org Received: (qmail 27756 invoked by uid 500); 26 May 2015 19:33:17 -0000 Mailing-List: contact dev-help@ambari.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@ambari.apache.org Delivered-To: mailing list dev@ambari.apache.org Received: (qmail 27744 invoked by uid 99); 26 May 2015 19:33:17 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 26 May 2015 19:33:17 +0000 Date: Tue, 26 May 2015 19:33:17 +0000 (UTC) From: "John Speidel (JIRA)" To: dev@ambari.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (AMBARI-11394) Blueprint cluster provision occasionally fails due to out of order database writes MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/AMBARI-11394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Speidel updated AMBARI-11394: ---------------------------------- Description: Provisioning a cluster may occasionally fail to complete as a result of an out of order database write. This error presents itself as start task(s) that never progresses beyond the PENDING state. For these logical pending tasks, there are no associated physical tasks. When a host is matched to a host request, an install request is submitted followed immediately by a start request. The install task transitions all host components desired_state for the host from INIT to INSTALLED. But, because of an error in the persistence layer, after the desired_state is set to INSTALLED, it is overwritten on another thread (heartbeat handler thread) to INIT. As a result, the component is never started because it it's desired state is INIT and isn't processed by the start operation. The root cause of this is that the public method ServiceComponentHostImpl.handleEvent() is annotated with '@Transactional'. Inside of this method the proper locks are acquired, BUT because this method is marked as @Transactional it's invocation is wrapped in a proxy which wraps the method invocation in a transaction. As a result, the transaction is committed in the proxy after the method returns outside of any synchronization which allows for out of order writes. was: Provisioning a cluster may occasionally fail to complete as a result of an out of order database write. This error presents itself as start task(s) that never progresses beyond the PENDING state. For these logical pending tasks, there are no associated physical tasks. When a host is matched to a host request, an install request is submitted followed immediately by a start request. The install task transitions all host components desired_state for the host from INIT to INSTALLED. But, because of an error in the persistence layer, after the desired_state is set to INSTALLED, it is overwritten on another thread (heartbeat handler thread) to INIT. As a result, the component is never started because it it's desired state is INIT and isn't processed by the start operation. The root cause of this is that the public method ServiceComponentHostImpl.handleEvent() is annotated with '@Transactional'. Inside of this method the proper locks are acquired, BUT because this method is marked as @Transactional it's invocation is wrapped in a proxy which starts and commits a transaction around the method. As a result, the transaction is committed in the proxy outside of any synchronization which allows for out of order writes. > Blueprint cluster provision occasionally fails due to out of order database writes > ---------------------------------------------------------------------------------- > > Key: AMBARI-11394 > URL: https://issues.apache.org/jira/browse/AMBARI-11394 > Project: Ambari > Issue Type: Bug > Affects Versions: 2.1.0 > Reporter: John Speidel > Assignee: John Speidel > Fix For: 2.1.0 > > > Provisioning a cluster may occasionally fail to complete as a result of an out of order database write. > This error presents itself as start task(s) that never progresses beyond the PENDING state. For these logical pending tasks, there are no associated physical tasks. > When a host is matched to a host request, an install request is submitted followed immediately by a start request. The install task transitions all host components desired_state for the host from INIT to INSTALLED. But, because of an error in the persistence layer, after the desired_state is set to INSTALLED, it is overwritten on another thread (heartbeat handler thread) to INIT. As a result, the component is never started because it it's desired state is INIT and isn't processed by the start operation. > The root cause of this is that the public method ServiceComponentHostImpl.handleEvent() is annotated with '@Transactional'. Inside of this method the proper locks are acquired, BUT because this method is marked as @Transactional it's invocation is wrapped in a proxy which wraps the method invocation in a transaction. As a result, the transaction is committed in the proxy after the method returns outside of any synchronization which allows for out of order writes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)