activemq-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mario Siegenthaler (JIRA)" <>
Subject [jira] Commented: (AMQ-1350) JDBC master/slave does not work properly with datasources that can reconnect to the database
Date Thu, 02 Aug 2007 23:43:49 GMT


Mario Siegenthaler commented on AMQ-1350:

I did some further research on this topic. Here's what I'm going for:
  a) Lock something (f.e. the lock table) on startup
      + on success: goto b
      + else: try to lock until you succeed (repeat a)
  b) Start the broker and a keep alive thread (executed every x seconds -> c)
  c) Check if we still do hold the lock (and that the db is still there)
      + if we do: wait till next keepalive, then execute c)
      + else: d) 
   d) Shut down the broker because there's another master running

Now the tricky part of this idea is actually set c), because there's no possibility to express
"go see if you can lock that row/table/whatever and return immediately if it's already locked"
(something like a tryLockNoWait). There isn't even a standard way to express a lock-wait-timeout.
While it's possible to simulate a lock timeout (f.e. terminate the query after 5s and consider
the table locked by another party) this is an unclean and IMO risky approach.

I can offer a solution for three database systems:
* MySQL: select get_lock("my_activemq_lock", 0); does exactly what I want to do. I doesn't
use the lock-table.
* MS SQL-Server: select * from activemq_lock where id=1 with readpast would skip the row if
it's locked without waiting, so we can look at the result count. The same should also be possible
with an update statement.
* Oracle: Is supposed to have the same feature as sql-server although with a slightly different

My research for a DB2 solution was without success, the others I didn't try yet.

Any feedback on this solution?

> JDBC master/slave does not work properly with datasources that can reconnect to the database
> --------------------------------------------------------------------------------------------
>                 Key: AMQ-1350
>                 URL:
>             Project: ActiveMQ
>          Issue Type: Bug
>          Components: Message Store
>    Affects Versions: 5.x
>         Environment: Linux x86_64, Sun jdk 1.6, Postgresql 8.2.4, c3p0 or other pooling
>            Reporter: Eric Anderson
>         Attachments: activemq-master-slave.patch
> This problem involves the JDBC master/slave configuration when the db server is restarted,
or when the brokers lose their JDBC connections for whatever reason temporarily, and when
a datasource is in use that can re-establish stale connections prior to providing them to
the broker.
> The problem lies with the JDBC locking strategy used to determine which broker is master
and which are slaves.  Let's say there are two brokers, a master and a slave, and they've
successfully initialized.  If you restart the database server, the slave will throw an exception
because it's just caught an exception while blocked attempting to get the lock.  The slave
will then *retry* the process of getting a lock over and over again.  Now, since the database
was bounced, the *master* will have lost its lock in the activemq_lock table.  However, with
the current 4.x-5.x code, it will never "know" that it has lost the lock.  There is no mechanism
to check the lock state.  So it will continue to think that it is the master and will leave
all of its network connectors active.
> When the slave tries to acquire the lock now, if the datasource has restored connections
to the now-restarted database server, it will succeed.  The slave will come up as master,
and there will be two masters active concurrently.  Both masters should at this point be fully-functional,
as both will have datasources that can talk to the database server once again.
> I have tested this with c3p0 and verified that I get two masters after bouncing the database
server.  If, at that point, I kill the original slave broker, the original master still appears
to be functioning normally.  If, instead, I kill the original master broker, messages are
still delivered via the original slave (now co-master).  It does not seem to matter which
broker the clients connect to - both work.
> There is no workaround that I can think of that would function correctly across multiple
database bounces.  If a slave's datasource does not have the functionality to do database
reconnects, then, after the first database server restart, it will never be able to establish
a connection to the db server in order to attempt to acquire the lock.  This, combined with
the fact that the JDBC master/slave topology does not have any favored brokers -- all can
be masters or slaves depending on start-up order and the failures that have occurred over
time, means that a datasource that can do reconnects is required on all brokers.  Therefore
it would seem that in the JDBC masters/slave topology a database restart or temporary loss
of database connectivity will always result in multiple masters.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message