cloudstack-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wido den Hollander <w...@widodh.nl>
Subject Re: database connection resilience
Date Sun, 07 Jul 2013 06:54:17 GMT
Hi,

On 07/07/2013 08:45 AM, Marcus Sorensen wrote:
> I see that my db.properties has db.cloud.autoReconnect=true, which
> translates to setting autoReconnect in the jdbc driver connection in
> utils/src/com/cloud/utils/db/Transaction.java. I also see that if I
> manually trigger the issue I get:
>

Just to confirm, I see the same issues. I haven't looked into this yet, 
but this is also one of the things I want to have fixed.

Maybe create an issue for it?

Wido

> 013-07-07 00:42:50,502 ERROR [cloud.cluster.ClusterManagerImpl]
> (Cluster-Heartbeat-1:null) Runtime DB exception
> com.mysql.jdbc.exceptions.jdbc4.CommunicationsException:
> Communications link failure
>
> The last packet successfully received from the server was 1,503
> milliseconds ago.  The last packet sent successfully to the server was
> 0 milliseconds ago.
> at sun.reflect.GeneratedConstructorAccessor159.newInstance(Unknown Source)
> at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:532)
> at com.mysql.jdbc.Util.handleNewInstance(Util.java:411)
> at com.mysql.jdbc.SQLError.createCommunicationsException(SQLError.java:1117)
> at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3567)
> at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3456)
> at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3997)
> at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2468)
> at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2629)
> at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2719)
> at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:2155)
> at com.mysql.jdbc.PreparedStatement.executeQuery(PreparedStatement.java:2318)
> at org.apache.commons.dbcp.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:96)
> at org.apache.commons.dbcp.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:96)
> at com.cloud.utils.db.GenericDaoBase.searchIncludingRemoved(GenericDaoBase.java:409)
> at com.cloud.utils.component.ComponentInstantiationPostProcessor$InterceptorDispatcher.intercept(ComponentInstantiationPostProcessor.java:125)
> at com.cloud.utils.db.GenericDaoBase.searchIncludingRemoved(GenericDaoBase.java:350)
> at com.cloud.utils.component.ComponentInstantiationPostProcessor$InterceptorDispatcher.intercept(ComponentInstantiationPostProcessor.java:125)
> at com.cloud.utils.db.GenericDaoBase.listIncludingRemovedBy(GenericDaoBase.java:907)
> at com.cloud.utils.component.ComponentInstantiationPostProcessor$InterceptorDispatcher.intercept(ComponentInstantiationPostProcessor.java:125)
> at com.cloud.utils.db.GenericDaoBase.listIncludingRemovedBy(GenericDaoBase.java:912)
> at com.cloud.utils.component.ComponentInstantiationPostProcessor$InterceptorDispatcher.intercept(ComponentInstantiationPostProcessor.java:125)
> at com.cloud.cluster.dao.ManagementServerHostDaoImpl.getActiveList(ManagementServerHostDaoImpl.java:158)
> at com.cloud.utils.component.ComponentInstantiationPostProcessor$InterceptorDispatcher.intercept(ComponentInstantiationPostProcessor.java:125)
> at com.cloud.cluster.ClusterManagerImpl.peerScan(ClusterManagerImpl.java:1057)
> at com.cloud.cluster.ClusterManagerImpl.access$1200(ClusterManagerImpl.java:95)
> at com.cloud.cluster.ClusterManagerImpl$4.run(ClusterManagerImpl.java:789)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351)
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
> at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:165)
> at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:267)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:679)
> Caused by: java.io.EOFException: Can not read response from server.
> Expected to read 4 bytes, read 0 bytes before connection was
> unexpectedly lost.
> ... 55 more
> 2013-07-07 00:42:50,505 ERROR [cloud.cluster.ClusterManagerImpl]
> (Cluster-Heartbeat-1:null) DB communication problem detected, fence it
>
> And I have only to restart cloudstack-management so it can connect to
> another member in the loadbalanced multimaster database to get things
> running again.
>
>
> On Sun, Jul 7, 2013 at 12:35 AM, Marcus Sorensen <shadowsor@gmail.com> wrote:
>> I've noticed that the cloudstack management server creates persistent
>> connections to the database, and crashes if the database connection is
>> lost. I haven't looked at the code yet, but I was wondering if anyone
>> knew about what was going on here, if it's simply not set up to
>> gracefully handle reconnect, or something else.  We have a
>> multi-master database setup, but cloudstack doesn't take advantage of
>> it since it doesn't attempt graceful reconnect, if the particular node
>> it connected to on startup goes down, it simply crashes.

Mime
View raw message