Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 7FDD9200BCB for ; Thu, 24 Nov 2016 18:59:48 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 7E6B6160B1E; Thu, 24 Nov 2016 17:59:48 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id C6C4C160AFB for ; Thu, 24 Nov 2016 18:59:47 +0100 (CET) Received: (qmail 22629 invoked by uid 500); 24 Nov 2016 17:59:46 -0000 Mailing-List: contact dev-help@cloudstack.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cloudstack.apache.org Delivered-To: mailing list dev@cloudstack.apache.org Received: (qmail 22612 invoked by uid 99); 24 Nov 2016 17:59:46 -0000 Received: from git1-us-west.apache.org (HELO git1-us-west.apache.org) (140.211.11.23) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 24 Nov 2016 17:59:46 +0000 Received: by git1-us-west.apache.org (ASF Mail Server at git1-us-west.apache.org, from userid 33) id 5998FDFCC7; Thu, 24 Nov 2016 17:59:46 +0000 (UTC) From: rafaelweingartner To: dev@cloudstack.apache.org Reply-To: dev@cloudstack.apache.org References: In-Reply-To: Subject: [GitHub] cloudstack issue #1762: CLOUDSTACK-9595 Transactions are not getting retried... Content-Type: text/plain Message-Id: <20161124175946.5998FDFCC7@git1-us-west.apache.org> Date: Thu, 24 Nov 2016 17:59:46 +0000 (UTC) archived-at: Thu, 24 Nov 2016 17:59:48 -0000 Github user rafaelweingartner commented on the issue: https://github.com/apache/cloudstack/pull/1762 @serg38 I have just now started reading this PR (excuse me if I overlooked some information). > If we are to try to implement a general way of dealing with deadlocks in ACS how could it be done to ensure DB consistency and correct transaction retry? Answering your question; in my opinion, we should not “try” to implement a general way of managing transactions. We are only having this type of problem because instead of using a framework to manage access and transactions in databases, it was developed a module to do that and incorporated to ACS; this means we have to maintain and live with this code. Now, the problem is that it would be a Dantesque task to change the way ACS manages transactions today. I am with John on this one, retrying is not a good idea; it can hide problems, cause overheads and cause even more headaches. I think that the best approach is to deal with this type of problem on the fly; this means, as John said, addressing them as bugs when they are reported. Having said that, I have not helped a bit to solve the problem… Let’s see if I can be of any help. I was reading the ticket #CLOUDSTACK-9595. It seems that the problem (reported there) happened when a VM was being removed from a table “instance_group_vm_map”. I just do not understand because the method called is “UserVmManagerImpl.addInstanceToGroup”. I am hoping that this makes sense. Anyways… The MYSQL docs have the following on deadlocks: > A deadlock is a situation where different transactions are unable to proceed because each holds a lock that the other needs This means, there was something else being executed when that VM was deleted/added, and this caused the deadlock and the exception. Probably something else is using the table “instance_group_vm_map”. I think we should track these two tasks/processes that can cause the problem and work them out, instead of looking for a generic way to deal with this situation. Maybe these processes that are causing deadlock are locking tables that are not needed or executing some processing that could be avoided or modified. Do we use case that can reproduce the problem? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastructure@apache.org or file a JIRA ticket with INFRA. ---