cloudstack-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Funs Kessen <FKes...@schubergphilis.com>
Subject RE: "Bug" introduced by part of commit a600d8408ea86782318139c17cf346c8497943d0
Date Thu, 10 Jul 2014 16:18:48 GMT
Hi Likitha,

You are absolutely spot on! This is the error message I found:
INFO  [c.c.c.ClusterManagerImpl] (main:null) Management server 90520734207775 is being started
WARN  [c.c.u.d.T.Transaction] (main:null) txn: Commit called when it is not a transaction:
-Transaction.execute:38-Transaction.execute:46-ClusterManagerImpl.start:940-CloudStackExtendedLifeCycle$1.with:75-CloudStackExtendedLifeCycle.with:153-CloudStackExtendedLifeCycle.startBeans:72-CloudStackExtendedLifeCycleStart.run:46-DefaultModuleDefinitionSet$1.with:105-DefaultModuleDefinitionSet.withModule:245-DefaultModuleDefinitionSet.withModule:250-DefaultModuleDefinitionSet.withModule:250-DefaultModuleDefinitionSet.withModule:233
INFO  [c.c.c.ClusterManagerImpl] (main:null) Management server (host id : 1) is being started
at 127.0.0.1:9090

Cheers,

Funs

> -----Original Message-----
> From: Likitha Shetty [mailto:likitha.shetty@citrix.com]
> Sent: Thursday, July 10, 2014 5:35 PM
> To: dev@cloudstack.apache.org
> Subject: RE: "Bug" introduced by part of commit
> a600d8408ea86782318139c17cf346c8497943d0
> 
> Are you guys are referring to the update method that is throwing the error,
> 'update(long id, long runid, Date lastUpdate) ?. Agreed that this is just a
> symptom.
> 
> But I was referring to the first update method in the file which is
> 'update(long id, long runid, String name, String version, String serviceIP, int
> servicePort, Date lastUpdate)'.
> From what I understand this is the method that updates the MS runId when it
> is started. And the problem is that the transaction commit made in this
> update is getting rolled back, which is why we have a wrong runId in the DB
> for the MS. Since the second update method is based on the right runId it
> throws an exception.
> And I suspect the original transaction is getting rolled back because we are
> trying to close a transaction that is already closed.
> 
> Thanks,
> Likitha
> 
> -----Original Message-----
> From: Funs Kessen [mailto:FKessen@schubergphilis.com]
> Sent: Thursday, July 10, 2014 5:10 PM
> To: dev@cloudstack.apache.org
> Subject: RE: "Bug" introduced by part of commit
> a600d8408ea86782318139c17cf346c8497943d0
> 
> Hi Santhosh,
> 
> You're right indeed, it's the update statement in the update itself that adds
> the runid in the where clause, it came in three years and three months ago by
> the looks of it.
> 
> Cheers,
> 
> Funs
> 
> > -----Original Message-----
> > From: Santhosh Edukulla [mailto:santhosh.edukulla@citrix.com]
> > Sent: Thursday, July 10, 2014 1:22 PM
> > To: dev@cloudstack.apache.org
> > Subject: RE: "Bug" introduced by part of commit
> > a600d8408ea86782318139c17cf346c8497943d0
> >
> > The mentioned issue by Funs was existing earlier as well but was
> > hidden by just logging it when update was failed, now with new change,
> > we resurfaced the problem by throwing it. Its not because of close
> > though. We can see to fix the original problem.
> >
> > Santhosh
> > ________________________________________
> > From: Likitha Shetty [likitha.shetty@citrix.com]
> > Sent: Thursday, July 10, 2014 7:18 AM
> > To: dev@cloudstack.apache.org
> > Subject: RE: "Bug" introduced by part of commit
> > a600d8408ea86782318139c17cf346c8497943d0
> >
> > Santhosh, I pulled the latest code and unfortunately the issue still exists.
> >
> > Looks like the problem is because we try to close the transaction
> > after it has been committed (which implicitly closes the transaction)
> > -
> /framework/cluster/src/com/cloud/cluster/dao/ManagementServerHostDao
> > Impl.java
> >             txn.commit();
> >         } catch (RuntimeException e) {
> >             txn.rollback();
> >             s_logger.warn("Unexpected exception, ", e);
> >             throw new RuntimeException(e.getMessage(), e);
> >         }finally {
> >             txn.close();
> >         }
> >
> > Thanks,
> > Likitha
> >
> > -----Original Message-----
> > From: Funs Kessen [mailto:FKessen@schubergphilis.com]
> > Sent: Thursday, July 10, 2014 2:38 PM
> > To: dev@cloudstack.apache.org
> > Subject: RE: "Bug" introduced by part of commit
> > a600d8408ea86782318139c17cf346c8497943d0
> >
> > Yeah I see that Ipasted the wrong bit of code *doh*, it was update
> > that was the problem from the same file:
> >
> > @@ -129,24 +133,29 @@
> >      @DB
> >      public void update(long id, long runid, Date lastUpdate) {
> >          TransactionLegacy txn = TransactionLegacy.currentTxn();
> > -        PreparedStatement pstmt = null;
> >          try {
> >              txn.start();
> > +            try( PreparedStatement pstmt =
> > + txn.prepareStatement("update mshost
> > set last_update=?, removed=null, alert_count=0 where id=? and
> > runid=?");) {
> > +                pstmt.setString(1,
> > DateUtil.getDateDisplayString(TimeZone.getTimeZone("GMT"),
> > lastUpdate));
> > +                pstmt.setLong(2, id);
> > +                pstmt.setLong(3, runid);
> >
> > -            pstmt = txn.prepareAutoCloseStatement("update mshost set
> > last_update=?, removed=null, alert_count=0 where id=? and runid=?");
> > -            pstmt.setString(1,
> > DateUtil.getDateDisplayString(TimeZone.getTimeZone("GMT"), lastUpdate));
> > -            pstmt.setLong(2, id);
> > -            pstmt.setLong(3, runid);
> > +                int count = pstmt.executeUpdate();
> > +                txn.commit();
> >
> > -            int count = pstmt.executeUpdate();
> > -            txn.commit();
> > -
> > -            if (count < 1) {
> > -                throw new CloudRuntimeException("Invalid cluster session
> > detected", new ClusterInvalidSessionException("runid " + runid + " is
> > no longer valid"));
> > +                if (count < 1) {
> > +                    throw new CloudRuntimeException("Invalid cluster
> > + session
> > detected", new ClusterInvalidSessionException("runid " + runid + " is
> > no longer valid"));
> > +                }
> > +            }catch (SQLException e) {
> > +                throw new
> > + CloudRuntimeException("update:Exception:"+e.getMessage(), e);
> >              }
> > -        } catch (Exception e) {
> > -            s_logger.warn("Unexpected exception, ", e);
> > -            throw new RuntimeException(e.getMessage(), e);
> > +        } catch (RuntimeException e) {
> > +            txn.rollback();
> > +            s_logger.warn("update:Exception:"+e.getMessage(), e);
> > +            throw new RuntimeException("update:Exception:"+e.getMessage(),
> e);
> > +        }
> > +        finally {
> > +            txn.close();
> >          }
> >      }
> >
> >
> > > -----Original Message-----
> > > From: Trippie [mailto:trippie@gmail.com] On Behalf Of Hugo Trippaers
> > > Sent: Thursday, July 10, 2014 11:00 AM
> > > To: dev@cloudstack.apache.org
> > > Subject: Re: "Bug" introduced by part of commit
> > > a600d8408ea86782318139c17cf346c8497943d0
> > >
> > > Fixed already, good job Santosh
> > >
> > > Cheers,
> > >
> > > Hugo
> > >
> > > On 10 jul. 2014, at 10:58, Hugo Trippaers <trippie@gmail.com> wrote:
> > >
> > > > Doesn't seem related to that particular commit, the function
> > > invalidateRunSession appears to be dead code.
> > > >
> > > > I'll do some digging.
> > > >
> > > >
> > > > Cheers,
> > > >
> > > > Hugo
> > > >
> > > > On 10 jul. 2014, at 10:40, Funs Kessen
> > > > <FKessen@schubergphilis.com>
> > > wrote:
> > > >
> > > >> Hi Devs,
> > > >>
> > > >> I recently noticed on master that after starting, stopping and
> > > >> starting the
> > > management server again I get stacktraces about every second.
> > > >> After some digging with Daan we found that part of the
> > > a600d8408ea86782318139c17cf346c84979943d0 commit causes this.
> > The
> > > problem is however deeper rooted, as the code is supposed to update
> > > the mshost table, but because another Session ID (actually _runId
> > > from ClusterManagerImpl.java from the previous run)  is already in
> > > there the update fails. The Session ID is based on time in
> > > milliseconds, and offcourse changes when you stop and start the
> management server again.
> > > >> Prior to the commit it failed silently, and has done so since the
> > > >> initial
> > > checkin it seems. The real question is what the original idea is
> > > behind the _runId, and is it something that requires fixing ?
> > > >>
> > > >> I've pasted part of the commit, sql and stacktrace below.
> > > >>
> > > >> Cheers,
> > > >>
> > > >> Funs
> > > >>
> > > >> ===
> > > >> mysql> select * from mshost;
> > > >> +----+----------------+---------------+---------+-------+----------------+------------+------
> --
> > --
> > > ----+---------------------+---------+-------------+
> > > >> | id | msid           | runid         | name    | state | version
       | service_ip
> |
> > > service_port | last_update         | removed | alert_count |
> > > >> +----+----------------+---------------+---------+-------+----------------+------------+------
> --
> > --
> > > ----+---------------------+---------+-------------+
> > > >> |  1 | 90520734207775 | 1404924979461 | cs-mgmt | Up    | 4.5.0-
> > > SNAPSHOT | 127.0.0.1  |         9090 | 2014-07-09 16:58:07 | NULL    |
> 0
> > |
> > > >> +----+----------------+---------------+---------+-------+----------------+------------+------
> --
> > --
> > > ----+---------------------+---------+-------------+
> > > >> 1 row in set (0.00 sec)
> > > >>
> > > >> ===
> > > >> commit a600d8408ea86782318139c17cf346c8497943d0
> > > >> Author: Santhosh Edukulla <santhosh.edukulla@gmail.com> 2014-07-
> > 02
> > > >> 10:38:16
> > > >> Committer: Santhosh Edukulla <santhosh.edukulla@gmail.com> 2014-
> > 07-
> > > 04
> > > >> 12:47:58
> > > >>
> > > >> Fixed Resource Leaks, null dereferences, few other issues
> > > >> reported by coverity
> > > >>
> > > >> -
> > > >>
> > >
> >
> framework/cluster/src/com/cloud/cluster/dao/ManagementServerHostDaoI
> > > m
> > > >> pl.java diff --git
> > > >>
> > >
> >
> a/framework/cluster/src/com/cloud/cluster/dao/ManagementServerHostDa
> > > o
> > > >> Impl.java
> > > >>
> > >
> >
> b/framework/cluster/src/com/cloud/cluster/dao/ManagementServerHostDa
> > > o
> > > >> Impl.java
> > > >> index 3d0c3f5..89d7d27 100644
> > > >> ---
> > > >>
> > >
> >
> a/framework/cluster/src/com/cloud/cluster/dao/ManagementServerHostDa
> > > o
> > > >> Impl.java
> > > >> +++
> > > b/framework/cluster/src/com/cloud/cluster/dao/ManagementServerHos
> > > >> +++ tDaoImpl.java
> > > >> @@ -53,15 +53,14 @@
> > > >>    @Override
> > > >>    public void invalidateRunSession(long id, long runid) {
> > > >>        TransactionLegacy txn = TransactionLegacy.currentTxn();
> > > >> -        PreparedStatement pstmt = null;
> > > >> -        try {
> > > >> -            pstmt = txn.prepareAutoCloseStatement("update mshost
set
> > > runid=0, state='Down' where id=? and runid=?");
> > > >> -            pstmt.setLong(1, id);
> > > >> -            pstmt.setLong(2, runid);
> > > >> -
> > > >> -            pstmt.executeUpdate();
> > > >> +        try (PreparedStatement pstmt =
> > > >> + txn.prepareStatement("update
> > > mshost set runid=0, state='Down' where id=? and runid=?");){
> > > >> +            if(pstmt != null) {
> > > >> +                pstmt.setLong(1, id);
> > > >> +                pstmt.setLong(2, runid);
> > > >> +                pstmt.executeUpdate();
> > > >> +            }
> > > >>        } catch (SQLException e) {
> > > >> -            throw new CloudRuntimeException("DB exception on " +
> > > pstmt.toString(), e);
> > > >> +            throw new
> > > >> + CloudRuntimeException("invalidateRunSession:Exception:"+
> > > >> + e.getMessage(), e);
> > > >>        }
> > > >>    }
> > > >>
> > > >> ===
> > > >> 2014-07-09 18:45:06,541 WARN
> > [c.c.c.d.ManagementServerHostDaoImpl]
> > > >> (Cluster-Heartbeat-1:ctx-5f2f8ad5) update:Exception:Invalid
> > > >> cluster session detected
> > > >> com.cloud.utils.exception.CloudRuntimeException: Invalid cluster
> > > >> session
> > > detected
> > > >>       at
> > >
> >
> com.cloud.cluster.dao.ManagementServerHostDaoImpl.update(Management
> > > ServerHostDaoImpl.java:147)
> > > >>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> > > >>       at
> > >
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.j
> > > a
> > > va:57)
> > > >>       at
> > >
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccess
> > > orImpl.java:43)
> > > >>       at java.lang.reflect.Method.invoke(Method.java:606)
> > > >>       at
> > > org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflect
> > > io
> > > n(
> > > AopUtils.java:317)
> > > >>       at
> > > org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJ
> > > oi
> > > n
> > > point(ReflectiveMethodInvocation.java:183)
> > > >>       at
> > >
> >
> org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(R
> > > eflectiveMethodInvocation.java:150)
> > > >>       at
> > > com.cloud.utils.db.TransactionContextInterceptor.invoke(TransactionC
> > > on
> > > tex
> > > tInterceptor.java:34)
> > > >>       at
> > >
> >
> org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(R
> > > eflectiveMethodInvocation.java:161)
> > > >>       at
> > > org.springframework.aop.interceptor.ExposeInvocationInterceptor.invo
> > > ke
> > > (Ex
> > > poseInvocationInterceptor.java:91)
> > > >>       at
> > >
> >
> org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(R
> > > eflectiveMethodInvocation.java:172)
> > > >>       at
> > >
> >
> org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynam
> > > icAopProxy.java:204)
> > > >>       at com.sun.proxy.$Proxy158.update(Unknown Source)
> > > >>       at
> > >
> >
> com.cloud.cluster.ClusterManagerImpl$4.runInContext(ClusterManagerImpl
> > > .java:545)
> > > >>       at
> > >
> >
> org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(M
> > > anagedContextRunnable.java:49)
> > > >>       at
> > >
> >
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.cal
> > > l
> > > (DefaultManagedContext.java:56)
> > > >>       at
> > >
> >
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callW
> > > ithContext(DefaultManagedContext.java:103)
> > > >>       at
> > >
> >
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWi
> > > thContext(DefaultManagedContext.java:53)
> > > >>       at
> > >
> >
> org.apache.cloudstack.managed.context.ManagedContextRunnable.run(Man
> > > agedContextRunnable.java:46)
> > > >>       at
> > > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:4
> > > 71
> > > )
> > > >>       at
> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
> > > >>       at
> > > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask
> > > .a
> > > c
> > > cess$301(ScheduledThreadPoolExecutor.java:178)
> > > >>       at
> > > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask
> > > .r
> > > u
> > > n(ScheduledThreadPoolExecutor.java:293)
> > > >>       at
> > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor
> > > .j
> > > av
> > > a:1145)
> > > >>       at
> > >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.
> > > ja
> > > va:615)
> > > >>       at java.lang.Thread.run(Thread.java:744)
> > > >> Caused by: com.cloud.cluster.ClusterInvalidSessionException:
> > > >> runid
> > > 1404924278923 is no longer valid
> > > >>       at
> > >
> >
> com.cloud.cluster.dao.ManagementServerHostDaoImpl.update(Management
> > > ServerHostDaoImpl.java:147)
> > > >>       ... 26 more
> > > >> 2014-07-09 18:45:06,543 ERROR [c.c.c.ClusterManagerImpl]
> > > >> (Cluster-Heartbeat-1:ctx-5f2f8ad5) Unexpected exception in
> > > >> cluster heartbeat
> > > >> java.lang.RuntimeException: update:Exception:Invalid cluster
> > > >> session
> > > detected
> > > >>       at
> > >
> >
> com.cloud.cluster.dao.ManagementServerHostDaoImpl.update(Management
> > > ServerHostDaoImpl.java:155)
> > > >>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> > > >>       at
> > >
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.j
> > > a
> > > va:57)
> > > >>       at
> > >
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccess
> > > orImpl.java:43)
> > > >>       at java.lang.reflect.Method.invoke(Method.java:606)
> > > >>       at
> > > org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflect
> > > io
> > > n(
> > > AopUtils.java:317)
> > > >>       at
> > > org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJ
> > > oi
> > > n
> > > point(ReflectiveMethodInvocation.java:183)
> > > >>       at
> > >
> >
> org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(R
> > > eflectiveMethodInvocation.java:150)
> > > >>       at
> > > com.cloud.utils.db.TransactionContextInterceptor.invoke(TransactionC
> > > on
> > > tex
> > > tInterceptor.java:34)
> > > >>       at
> > >
> >
> org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(R
> > > eflectiveMethodInvocation.java:161)
> > > >>       at
> > > org.springframework.aop.interceptor.ExposeInvocationInterceptor.invo
> > > ke
> > > (Ex
> > > poseInvocationInterceptor.java:91)
> > > >>       at
> > >
> >
> org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(R
> > > eflectiveMethodInvocation.java:172)
> > > >>       at
> > >
> >
> org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynam
> > > icAopProxy.java:204)
> > > >>       at com.sun.proxy.$Proxy158.update(Unknown Source)
> > > >>       at
> > >
> >
> com.cloud.cluster.ClusterManagerImpl$4.runInContext(ClusterManagerImpl
> > > .java:545)
> > > >>       at
> > >
> >
> org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(M
> > > anagedContextRunnable.java:49)
> > > >>       at
> > >
> >
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.cal
> > > l
> > > (DefaultManagedContext.java:56)
> > > >>       at
> > >
> >
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callW
> > > ithContext(DefaultManagedContext.java:103)
> > > >>       at
> > >
> >
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWi
> > > thContext(DefaultManagedContext.java:53)
> > > >>       at
> > >
> >
> org.apache.cloudstack.managed.context.ManagedContextRunnable.run(Man
> > > agedContextRunnable.java:46)
> > > >>       at
> > > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:4
> > > 71
> > > )
> > > >>       at
> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
> > > >>       at
> > > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask
> > > .a
> > > c
> > > cess$301(ScheduledThreadPoolExecutor.java:178)
> > > >>       at
> > > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask
> > > .r
> > > u
> > > n(ScheduledThreadPoolExecutor.java:293)
> > > >>       at
> > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor
> > > .j
> > > av
> > > a:1145)
> > > >>       at
> > >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.
> > > ja
> > > va:615)
> > > >>       at java.lang.Thread.run(Thread.java:744)
> > > >> Caused by: com.cloud.utils.exception.CloudRuntimeException:
> > > >> Invalid
> > > cluster session detected
> > > >>       at
> > >
> >
> com.cloud.cluster.dao.ManagementServerHostDaoImpl.update(Management
> > > ServerHostDaoImpl.java:147)
> > > >>       ... 26 more
> > > >> Caused by: com.cloud.cluster.ClusterInvalidSessionException:
> > > >> runid
> > > 1404924278923 is no longer valid
> > > >>       at
> > >
> >
> com.cloud.cluster.dao.ManagementServerHostDaoImpl.update(Management
> > > ServerHostDaoImpl.java:147)
> > > >>       ... 26 more
> > > >>
> > > >>
> > > >>
> > > >


Mime
View raw message