cloudstack-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Likitha Shetty <likitha.she...@citrix.com>
Subject RE: "Bug" introduced by part of commit a600d8408ea86782318139c17cf346c8497943d0
Date Thu, 10 Jul 2014 15:34:36 GMT
Are you guys are referring to the update method that is throwing the error, 'update(long id,
long runid, Date lastUpdate) ?. Agreed that this is just a symptom.

But I was referring to the first update method in the file which is 'update(long id, long
runid, String name, String version, String serviceIP, int servicePort, Date lastUpdate)'.
>From what I understand this is the method that updates the MS runId when it is started.
And the problem is that the transaction commit made in this update is getting rolled back,
which is why we have a wrong runId in the DB for the MS. Since the second update method is
based on the right runId it throws an exception.
And I suspect the original transaction is getting rolled back because we are trying to close
a transaction that is already closed.

Thanks,
Likitha

-----Original Message-----
From: Funs Kessen [mailto:FKessen@schubergphilis.com] 
Sent: Thursday, July 10, 2014 5:10 PM
To: dev@cloudstack.apache.org
Subject: RE: "Bug" introduced by part of commit a600d8408ea86782318139c17cf346c8497943d0

Hi Santhosh,

You're right indeed, it's the update statement in the update itself that adds the runid in
the where clause, it came in three years and three months ago by the looks of it.

Cheers,

Funs

> -----Original Message-----
> From: Santhosh Edukulla [mailto:santhosh.edukulla@citrix.com]
> Sent: Thursday, July 10, 2014 1:22 PM
> To: dev@cloudstack.apache.org
> Subject: RE: "Bug" introduced by part of commit
> a600d8408ea86782318139c17cf346c8497943d0
> 
> The mentioned issue by Funs was existing earlier as well but was 
> hidden by just logging it when update was failed, now with new change, 
> we resurfaced the problem by throwing it. Its not because of close 
> though. We can see to fix the original problem.
> 
> Santhosh
> ________________________________________
> From: Likitha Shetty [likitha.shetty@citrix.com]
> Sent: Thursday, July 10, 2014 7:18 AM
> To: dev@cloudstack.apache.org
> Subject: RE: "Bug" introduced by part of commit
> a600d8408ea86782318139c17cf346c8497943d0
> 
> Santhosh, I pulled the latest code and unfortunately the issue still exists.
> 
> Looks like the problem is because we try to close the transaction 
> after it has been committed (which implicitly closes the transaction) 
> - /framework/cluster/src/com/cloud/cluster/dao/ManagementServerHostDao
> Impl.java
>             txn.commit();
>         } catch (RuntimeException e) {
>             txn.rollback();
>             s_logger.warn("Unexpected exception, ", e);
>             throw new RuntimeException(e.getMessage(), e);
>         }finally {
>             txn.close();
>         }
> 
> Thanks,
> Likitha
> 
> -----Original Message-----
> From: Funs Kessen [mailto:FKessen@schubergphilis.com]
> Sent: Thursday, July 10, 2014 2:38 PM
> To: dev@cloudstack.apache.org
> Subject: RE: "Bug" introduced by part of commit
> a600d8408ea86782318139c17cf346c8497943d0
> 
> Yeah I see that Ipasted the wrong bit of code *doh*, it was update 
> that was the problem from the same file:
> 
> @@ -129,24 +133,29 @@
>      @DB
>      public void update(long id, long runid, Date lastUpdate) {
>          TransactionLegacy txn = TransactionLegacy.currentTxn();
> -        PreparedStatement pstmt = null;
>          try {
>              txn.start();
> +            try( PreparedStatement pstmt = 
> + txn.prepareStatement("update mshost
> set last_update=?, removed=null, alert_count=0 where id=? and 
> runid=?");) {
> +                pstmt.setString(1,
> DateUtil.getDateDisplayString(TimeZone.getTimeZone("GMT"), 
> lastUpdate));
> +                pstmt.setLong(2, id);
> +                pstmt.setLong(3, runid);
> 
> -            pstmt = txn.prepareAutoCloseStatement("update mshost set
> last_update=?, removed=null, alert_count=0 where id=? and runid=?");
> -            pstmt.setString(1,
> DateUtil.getDateDisplayString(TimeZone.getTimeZone("GMT"), lastUpdate));
> -            pstmt.setLong(2, id);
> -            pstmt.setLong(3, runid);
> +                int count = pstmt.executeUpdate();
> +                txn.commit();
> 
> -            int count = pstmt.executeUpdate();
> -            txn.commit();
> -
> -            if (count < 1) {
> -                throw new CloudRuntimeException("Invalid cluster session
> detected", new ClusterInvalidSessionException("runid " + runid + " is 
> no longer valid"));
> +                if (count < 1) {
> +                    throw new CloudRuntimeException("Invalid cluster 
> + session
> detected", new ClusterInvalidSessionException("runid " + runid + " is 
> no longer valid"));
> +                }
> +            }catch (SQLException e) {
> +                throw new
> + CloudRuntimeException("update:Exception:"+e.getMessage(), e);
>              }
> -        } catch (Exception e) {
> -            s_logger.warn("Unexpected exception, ", e);
> -            throw new RuntimeException(e.getMessage(), e);
> +        } catch (RuntimeException e) {
> +            txn.rollback();
> +            s_logger.warn("update:Exception:"+e.getMessage(), e);
> +            throw new RuntimeException("update:Exception:"+e.getMessage(), e);
> +        }
> +        finally {
> +            txn.close();
>          }
>      }
> 
> 
> > -----Original Message-----
> > From: Trippie [mailto:trippie@gmail.com] On Behalf Of Hugo Trippaers
> > Sent: Thursday, July 10, 2014 11:00 AM
> > To: dev@cloudstack.apache.org
> > Subject: Re: "Bug" introduced by part of commit
> > a600d8408ea86782318139c17cf346c8497943d0
> >
> > Fixed already, good job Santosh
> >
> > Cheers,
> >
> > Hugo
> >
> > On 10 jul. 2014, at 10:58, Hugo Trippaers <trippie@gmail.com> wrote:
> >
> > > Doesn't seem related to that particular commit, the function
> > invalidateRunSession appears to be dead code.
> > >
> > > I'll do some digging.
> > >
> > >
> > > Cheers,
> > >
> > > Hugo
> > >
> > > On 10 jul. 2014, at 10:40, Funs Kessen 
> > > <FKessen@schubergphilis.com>
> > wrote:
> > >
> > >> Hi Devs,
> > >>
> > >> I recently noticed on master that after starting, stopping and 
> > >> starting the
> > management server again I get stacktraces about every second.
> > >> After some digging with Daan we found that part of the
> > a600d8408ea86782318139c17cf346c84979943d0 commit causes this.
> The
> > problem is however deeper rooted, as the code is supposed to update 
> > the mshost table, but because another Session ID (actually _runId 
> > from ClusterManagerImpl.java from the previous run)  is already in 
> > there the update fails. The Session ID is based on time in 
> > milliseconds, and offcourse changes when you stop and start the management server
again.
> > >> Prior to the commit it failed silently, and has done so since the 
> > >> initial
> > checkin it seems. The real question is what the original idea is 
> > behind the _runId, and is it something that requires fixing ?
> > >>
> > >> I've pasted part of the commit, sql and stacktrace below.
> > >>
> > >> Cheers,
> > >>
> > >> Funs
> > >>
> > >> ===
> > >> mysql> select * from mshost;
> > >> +----+----------------+---------------+---------+-------+----------------+------------+--------
> --
> > ----+---------------------+---------+-------------+
> > >> | id | msid           | runid         | name    | state | version     
  | service_ip |
> > service_port | last_update         | removed | alert_count |
> > >> +----+----------------+---------------+---------+-------+----------------+------------+--------
> --
> > ----+---------------------+---------+-------------+
> > >> |  1 | 90520734207775 | 1404924979461 | cs-mgmt | Up    | 4.5.0-
> > SNAPSHOT | 127.0.0.1  |         9090 | 2014-07-09 16:58:07 | NULL    |         
 0
> |
> > >> +----+----------------+---------------+---------+-------+----------------+------------+--------
> --
> > ----+---------------------+---------+-------------+
> > >> 1 row in set (0.00 sec)
> > >>
> > >> ===
> > >> commit a600d8408ea86782318139c17cf346c8497943d0
> > >> Author: Santhosh Edukulla <santhosh.edukulla@gmail.com> 2014-07-
> 02
> > >> 10:38:16
> > >> Committer: Santhosh Edukulla <santhosh.edukulla@gmail.com> 2014-
> 07-
> > 04
> > >> 12:47:58
> > >>
> > >> Fixed Resource Leaks, null dereferences, few other issues 
> > >> reported by coverity
> > >>
> > >> -
> > >>
> >
> framework/cluster/src/com/cloud/cluster/dao/ManagementServerHostDaoI
> > m
> > >> pl.java diff --git
> > >>
> >
> a/framework/cluster/src/com/cloud/cluster/dao/ManagementServerHostDa
> > o
> > >> Impl.java
> > >>
> >
> b/framework/cluster/src/com/cloud/cluster/dao/ManagementServerHostDa
> > o
> > >> Impl.java
> > >> index 3d0c3f5..89d7d27 100644
> > >> ---
> > >>
> >
> a/framework/cluster/src/com/cloud/cluster/dao/ManagementServerHostDa
> > o
> > >> Impl.java
> > >> +++
> > b/framework/cluster/src/com/cloud/cluster/dao/ManagementServerHos
> > >> +++ tDaoImpl.java
> > >> @@ -53,15 +53,14 @@
> > >>    @Override
> > >>    public void invalidateRunSession(long id, long runid) {
> > >>        TransactionLegacy txn = TransactionLegacy.currentTxn();
> > >> -        PreparedStatement pstmt = null;
> > >> -        try {
> > >> -            pstmt = txn.prepareAutoCloseStatement("update mshost set
> > runid=0, state='Down' where id=? and runid=?");
> > >> -            pstmt.setLong(1, id);
> > >> -            pstmt.setLong(2, runid);
> > >> -
> > >> -            pstmt.executeUpdate();
> > >> +        try (PreparedStatement pstmt = 
> > >> + txn.prepareStatement("update
> > mshost set runid=0, state='Down' where id=? and runid=?");){
> > >> +            if(pstmt != null) {
> > >> +                pstmt.setLong(1, id);
> > >> +                pstmt.setLong(2, runid);
> > >> +                pstmt.executeUpdate();
> > >> +            }
> > >>        } catch (SQLException e) {
> > >> -            throw new CloudRuntimeException("DB exception on " +
> > pstmt.toString(), e);
> > >> +            throw new
> > >> + CloudRuntimeException("invalidateRunSession:Exception:"+
> > >> + e.getMessage(), e);
> > >>        }
> > >>    }
> > >>
> > >> ===
> > >> 2014-07-09 18:45:06,541 WARN
> [c.c.c.d.ManagementServerHostDaoImpl]
> > >> (Cluster-Heartbeat-1:ctx-5f2f8ad5) update:Exception:Invalid 
> > >> cluster session detected
> > >> com.cloud.utils.exception.CloudRuntimeException: Invalid cluster 
> > >> session
> > detected
> > >>       at
> >
> com.cloud.cluster.dao.ManagementServerHostDaoImpl.update(Management
> > ServerHostDaoImpl.java:147)
> > >>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> > >>       at
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.j
> > a
> > va:57)
> > >>       at
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccess
> > orImpl.java:43)
> > >>       at java.lang.reflect.Method.invoke(Method.java:606)
> > >>       at
> > org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflect
> > io
> > n(
> > AopUtils.java:317)
> > >>       at
> > org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJ
> > oi
> > n
> > point(ReflectiveMethodInvocation.java:183)
> > >>       at
> >
> org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(R
> > eflectiveMethodInvocation.java:150)
> > >>       at
> > com.cloud.utils.db.TransactionContextInterceptor.invoke(TransactionC
> > on
> > tex
> > tInterceptor.java:34)
> > >>       at
> >
> org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(R
> > eflectiveMethodInvocation.java:161)
> > >>       at
> > org.springframework.aop.interceptor.ExposeInvocationInterceptor.invo
> > ke
> > (Ex
> > poseInvocationInterceptor.java:91)
> > >>       at
> >
> org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(R
> > eflectiveMethodInvocation.java:172)
> > >>       at
> >
> org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynam
> > icAopProxy.java:204)
> > >>       at com.sun.proxy.$Proxy158.update(Unknown Source)
> > >>       at
> >
> com.cloud.cluster.ClusterManagerImpl$4.runInContext(ClusterManagerImpl
> > .java:545)
> > >>       at
> >
> org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(M
> > anagedContextRunnable.java:49)
> > >>       at
> >
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.cal
> > l
> > (DefaultManagedContext.java:56)
> > >>       at
> >
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callW
> > ithContext(DefaultManagedContext.java:103)
> > >>       at
> >
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWi
> > thContext(DefaultManagedContext.java:53)
> > >>       at
> >
> org.apache.cloudstack.managed.context.ManagedContextRunnable.run(Man
> > agedContextRunnable.java:46)
> > >>       at
> > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:4
> > 71
> > )
> > >>       at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
> > >>       at
> > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask
> > .a
> > c
> > cess$301(ScheduledThreadPoolExecutor.java:178)
> > >>       at
> > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask
> > .r
> > u
> > n(ScheduledThreadPoolExecutor.java:293)
> > >>       at
> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor
> > .j
> > av
> > a:1145)
> > >>       at
> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.
> > ja
> > va:615)
> > >>       at java.lang.Thread.run(Thread.java:744)
> > >> Caused by: com.cloud.cluster.ClusterInvalidSessionException: 
> > >> runid
> > 1404924278923 is no longer valid
> > >>       at
> >
> com.cloud.cluster.dao.ManagementServerHostDaoImpl.update(Management
> > ServerHostDaoImpl.java:147)
> > >>       ... 26 more
> > >> 2014-07-09 18:45:06,543 ERROR [c.c.c.ClusterManagerImpl]
> > >> (Cluster-Heartbeat-1:ctx-5f2f8ad5) Unexpected exception in 
> > >> cluster heartbeat
> > >> java.lang.RuntimeException: update:Exception:Invalid cluster 
> > >> session
> > detected
> > >>       at
> >
> com.cloud.cluster.dao.ManagementServerHostDaoImpl.update(Management
> > ServerHostDaoImpl.java:155)
> > >>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> > >>       at
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.j
> > a
> > va:57)
> > >>       at
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccess
> > orImpl.java:43)
> > >>       at java.lang.reflect.Method.invoke(Method.java:606)
> > >>       at
> > org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflect
> > io
> > n(
> > AopUtils.java:317)
> > >>       at
> > org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJ
> > oi
> > n
> > point(ReflectiveMethodInvocation.java:183)
> > >>       at
> >
> org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(R
> > eflectiveMethodInvocation.java:150)
> > >>       at
> > com.cloud.utils.db.TransactionContextInterceptor.invoke(TransactionC
> > on
> > tex
> > tInterceptor.java:34)
> > >>       at
> >
> org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(R
> > eflectiveMethodInvocation.java:161)
> > >>       at
> > org.springframework.aop.interceptor.ExposeInvocationInterceptor.invo
> > ke
> > (Ex
> > poseInvocationInterceptor.java:91)
> > >>       at
> >
> org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(R
> > eflectiveMethodInvocation.java:172)
> > >>       at
> >
> org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynam
> > icAopProxy.java:204)
> > >>       at com.sun.proxy.$Proxy158.update(Unknown Source)
> > >>       at
> >
> com.cloud.cluster.ClusterManagerImpl$4.runInContext(ClusterManagerImpl
> > .java:545)
> > >>       at
> >
> org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(M
> > anagedContextRunnable.java:49)
> > >>       at
> >
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.cal
> > l
> > (DefaultManagedContext.java:56)
> > >>       at
> >
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callW
> > ithContext(DefaultManagedContext.java:103)
> > >>       at
> >
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWi
> > thContext(DefaultManagedContext.java:53)
> > >>       at
> >
> org.apache.cloudstack.managed.context.ManagedContextRunnable.run(Man
> > agedContextRunnable.java:46)
> > >>       at
> > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:4
> > 71
> > )
> > >>       at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
> > >>       at
> > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask
> > .a
> > c
> > cess$301(ScheduledThreadPoolExecutor.java:178)
> > >>       at
> > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask
> > .r
> > u
> > n(ScheduledThreadPoolExecutor.java:293)
> > >>       at
> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor
> > .j
> > av
> > a:1145)
> > >>       at
> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.
> > ja
> > va:615)
> > >>       at java.lang.Thread.run(Thread.java:744)
> > >> Caused by: com.cloud.utils.exception.CloudRuntimeException: 
> > >> Invalid
> > cluster session detected
> > >>       at
> >
> com.cloud.cluster.dao.ManagementServerHostDaoImpl.update(Management
> > ServerHostDaoImpl.java:147)
> > >>       ... 26 more
> > >> Caused by: com.cloud.cluster.ClusterInvalidSessionException: 
> > >> runid
> > 1404924278923 is no longer valid
> > >>       at
> >
> com.cloud.cluster.dao.ManagementServerHostDaoImpl.update(Management
> > ServerHostDaoImpl.java:147)
> > >>       ... 26 more
> > >>
> > >>
> > >>
> > >


Mime
View raw message