ambari-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "John Speidel (JIRA)" <>
Subject [jira] [Created] (AMBARI-11394) Blueprint cluster provision occasionally fails due to out of order database writes
Date Tue, 26 May 2015 19:28:18 GMT
John Speidel created AMBARI-11394:

             Summary: Blueprint cluster provision occasionally fails due to out of order database
                 Key: AMBARI-11394
             Project: Ambari
          Issue Type: Bug
    Affects Versions: 2.1.0
            Reporter: John Speidel
            Assignee: John Speidel
             Fix For: 2.1.0

Provisioning a cluster may occasionally fail to complete as a result of an out of order database

This error presents itself as start task(s) that never progresses beyond the PENDING state.
 For these logical pending tasks, there are no associated physical tasks.

When a host is matched to a host request, an install request is submitted followed immediately
by a start request.  The install task transitions all host components desired_state for the
host from INIT to INSTALLED.  But, because of an error in the persistence layer, after the
desired_state is set to INSTALLED, it is overwritten on another thread (heartbeat handler
thread) to INIT.  As a result, the component is never started because it it's desired state
is INIT and isn't processed by the start operation.

The root cause of this is that  the public method ServiceComponentHostImpl.handleEvent() is
annotated with '@Transactional'.  Inside of this method the proper locks are acquired, BUT
because this method is marked as @Transactional it's invocation is wrapped in a proxy which
starts and commits a transaction around the method.  As a result, the transaction is committed
in the proxy outside of any synchronization which allows for out of order writes.  

This message was sent by Atlassian JIRA

View raw message