cloudstack-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marcus Sorensen <shadow...@gmail.com>
Subject Re: database connection resilience
Date Sun, 07 Jul 2013 22:36:52 GMT
I think there are two separate issues here.

1) The management server uses the database to determine cluster
membership, and if no database connection can be made, the management
server fences itself (shuts down). This is good, but in the case where
there's only one management server (no cluster intended), it seems
like an issue. However, it may be better to shut down, I'm not sure
how the management server will react after a temporary database
outage. Some opinions would be appreciated, my preference would be
that a single-management server would just be able to pick back up
where it left off rather than dying.

2) There is no support for JDBC's built-in loadbalancing features. I
have a patch that fixes this, however I noticed a few things that I'd
like some feedback on. Namely, the awsapi database connection doesn't
have its own settings, rather it uses the same host connection
settings as the cloud db and the autoReconnect setting from the usage
database settings. Was this a shortcut, or is there a reason for it?
My current version of the patch just keeps the same methodology, but
it seems that while I'm at adding properties to db.properties we could
allow true db.awsapi.host and db.awsapi.port.

On Sun, Jul 7, 2013 at 1:02 AM, Marcus Sorensen <shadowsor@gmail.com> wrote:
> Oh, and I should correct myself, it doesn't crash, it seems that the
> management server fences itself because it can't talk to the database.
>
> On Sun, Jul 7, 2013 at 12:59 AM, Marcus Sorensen <shadowsor@gmail.com> wrote:
>> Ok. After a cursory look, I've seen that the autoReconnect is kind of
>> a bad option for jdbc. I've also found this, which seems kind of hairy
>> for what I want to do:
>>
>> http://dev.mysql.com/doc/refman/5.0/en/connector-j-usagenotes-j2ee-concepts-managing-load-balanced-connections.html
>>
>> I don't necessarily want to hand off the loadbalancing management to
>> the java code, I just want cloudstack to automatically reinitialize
>> the database connection when this 'communications link failure'
>> occurs, maybe with a db.cloud.connection.retry.count property or
>> similar.
>>
>> On Sun, Jul 7, 2013 at 12:54 AM, Wido den Hollander <wido@widodh.nl> wrote:
>>> Hi,
>>>
>>>
>>> On 07/07/2013 08:45 AM, Marcus Sorensen wrote:
>>>>
>>>> I see that my db.properties has db.cloud.autoReconnect=true, which
>>>> translates to setting autoReconnect in the jdbc driver connection in
>>>> utils/src/com/cloud/utils/db/Transaction.java. I also see that if I
>>>> manually trigger the issue I get:
>>>>
>>>
>>> Just to confirm, I see the same issues. I haven't looked into this yet, but
>>> this is also one of the things I want to have fixed.
>>>
>>> Maybe create an issue for it?
>>>
>>> Wido
>>>
>>>
>>>> 013-07-07 00:42:50,502 ERROR [cloud.cluster.ClusterManagerImpl]
>>>> (Cluster-Heartbeat-1:null) Runtime DB exception
>>>> com.mysql.jdbc.exceptions.jdbc4.CommunicationsException:
>>>> Communications link failure
>>>>
>>>> The last packet successfully received from the server was 1,503
>>>> milliseconds ago.  The last packet sent successfully to the server was
>>>> 0 milliseconds ago.
>>>> at sun.reflect.GeneratedConstructorAccessor159.newInstance(Unknown Source)
>>>> at
>>>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>>>> at java.lang.reflect.Constructor.newInstance(Constructor.java:532)
>>>> at com.mysql.jdbc.Util.handleNewInstance(Util.java:411)
>>>> at
>>>> com.mysql.jdbc.SQLError.createCommunicationsException(SQLError.java:1117)
>>>> at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3567)
>>>> at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3456)
>>>> at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3997)
>>>> at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2468)
>>>> at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2629)
>>>> at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2719)
>>>> at
>>>> com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:2155)
>>>> at
>>>> com.mysql.jdbc.PreparedStatement.executeQuery(PreparedStatement.java:2318)
>>>> at
>>>> org.apache.commons.dbcp.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:96)
>>>> at
>>>> org.apache.commons.dbcp.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:96)
>>>> at
>>>> com.cloud.utils.db.GenericDaoBase.searchIncludingRemoved(GenericDaoBase.java:409)
>>>> at
>>>> com.cloud.utils.component.ComponentInstantiationPostProcessor$InterceptorDispatcher.intercept(ComponentInstantiationPostProcessor.java:125)
>>>> at
>>>> com.cloud.utils.db.GenericDaoBase.searchIncludingRemoved(GenericDaoBase.java:350)
>>>> at
>>>> com.cloud.utils.component.ComponentInstantiationPostProcessor$InterceptorDispatcher.intercept(ComponentInstantiationPostProcessor.java:125)
>>>> at
>>>> com.cloud.utils.db.GenericDaoBase.listIncludingRemovedBy(GenericDaoBase.java:907)
>>>> at
>>>> com.cloud.utils.component.ComponentInstantiationPostProcessor$InterceptorDispatcher.intercept(ComponentInstantiationPostProcessor.java:125)
>>>> at
>>>> com.cloud.utils.db.GenericDaoBase.listIncludingRemovedBy(GenericDaoBase.java:912)
>>>> at
>>>> com.cloud.utils.component.ComponentInstantiationPostProcessor$InterceptorDispatcher.intercept(ComponentInstantiationPostProcessor.java:125)
>>>> at
>>>> com.cloud.cluster.dao.ManagementServerHostDaoImpl.getActiveList(ManagementServerHostDaoImpl.java:158)
>>>> at
>>>> com.cloud.utils.component.ComponentInstantiationPostProcessor$InterceptorDispatcher.intercept(ComponentInstantiationPostProcessor.java:125)
>>>> at
>>>> com.cloud.cluster.ClusterManagerImpl.peerScan(ClusterManagerImpl.java:1057)
>>>> at
>>>> com.cloud.cluster.ClusterManagerImpl.access$1200(ClusterManagerImpl.java:95)
>>>> at com.cloud.cluster.ClusterManagerImpl$4.run(ClusterManagerImpl.java:789)
>>>> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>>>> at
>>>> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351)
>>>> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
>>>> at
>>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:165)
>>>> at
>>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:267)
>>>> at
>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
>>>> at
>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>> at java.lang.Thread.run(Thread.java:679)
>>>> Caused by: java.io.EOFException: Can not read response from server.
>>>> Expected to read 4 bytes, read 0 bytes before connection was
>>>> unexpectedly lost.
>>>> ... 55 more
>>>> 2013-07-07 00:42:50,505 ERROR [cloud.cluster.ClusterManagerImpl]
>>>> (Cluster-Heartbeat-1:null) DB communication problem detected, fence it
>>>>
>>>> And I have only to restart cloudstack-management so it can connect to
>>>> another member in the loadbalanced multimaster database to get things
>>>> running again.
>>>>
>>>>
>>>> On Sun, Jul 7, 2013 at 12:35 AM, Marcus Sorensen <shadowsor@gmail.com>
>>>> wrote:
>>>>>
>>>>> I've noticed that the cloudstack management server creates persistent
>>>>> connections to the database, and crashes if the database connection is
>>>>> lost. I haven't looked at the code yet, but I was wondering if anyone
>>>>> knew about what was going on here, if it's simply not set up to
>>>>> gracefully handle reconnect, or something else.  We have a
>>>>> multi-master database setup, but cloudstack doesn't take advantage of
>>>>> it since it doesn't attempt graceful reconnect, if the particular node
>>>>> it connected to on startup goes down, it simply crashes.

Mime
View raw message