activemq-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex Hooper <ahoo...@bmjgroup.com>
Subject Re: change strategy for determining failure of primary in JBDC-backed setup
Date Thu, 24 May 2012 13:56:47 GMT
Gary Tully uttered:
> You would need to write some code, but the locker implementation can
> be easily overridden.
> The interface is: org.apache.activemq.store.jdbc.DatabaseLocker
> 
> It acquires the lock in start which typically blocks till it can get a
> lock and there are periodic calls to keepalive once the lock is
> obtained.
> 
> and it is set via xml config on the JDBCPersistenceAdapter via
> org.apache.activemq.store.jdbc.JDBCPersistenceAdapter#setDatabaseLocker
> 

Ah, excellent; I'd not managed to tease that information out of the docs.

> A lease type strategy may make sense, where a read to determine if
> there is an existing owner is followed by a poll when the lease is
> expired or an update to start a new lease if none exists. The owner of
> the lease needs to renew before it expires and that interval needs to
> be configurable to allow timely reclamation.
> 
> In the event that the connection drops, if it is recreated before the
> lease expires, the master/slave state is retained. If the lease has
> expired, a master and slave will contend for the lock to be the new
> master.

Yes, that might make sense, thanks. Will need further pondering...

> 
> In your setup, it is odd that the dropped connection does not cause
> the lock keepAlive to fail and the broker to terminate. It should,
> unless there are tcp level options that need to kick in to see the
> half close. Or some connection pool config that can pick up on the
> failure, there are some validate options on commons jdbc pool that
> could help there.

I had already identified the commons pool options that might help and have 
configured thusly:

   <bean id="oracle-ds"
   class="org.apache.commons.dbcp.BasicDataSource"
   destroy-method="close">
     <property name="driverClassName"
     value="oracle.jdbc.driver.OracleDriver" />
     <property name="url" value="jdbc:oracle:thin:@oracle:1521:bmj01" />
     <property name="username" value="activemq" />
     <property name="password" value="activemq" />
     <property name="maxActive" value="200" />
     <property name="maxIdle" value="5" />
     <property name="testWhileIdle" value="true" />
     <property name="timeBetweenEvictionRunsMillis" value="30000" />
     <property name="validationQuery" value="SELECT 1 FROM DUAL" />
     <property name="poolPreparedStatements" value="true" />
   </bean>


I have yet to try 'removeAbandoned' as that doesn't seem to be appropriate.

Interestingly, netstat on the slave activemq box shows an ESTABLISHED TCP 
connection to the oracle server, but the oracle server shows no socket in any 
state connected to the slave activemq. Which kind of explains why activemq isn't 
noticing the connection drop. So... maybe the 'removeAbaandoned' option is 
appropriate, as the connection is not getting cleared by the dbcp checks because 
the connection that has been used to issue the "SELECT * FROM ACTIVEMQ_LOCK FOR 
UPDATE" is deemed as being active and thus never checked.

More fundamentally, of course, I need to work out what's going wrong at the TCP 
level and sort that.

[snip]
> 
> Hopefully the above will help, but start with determining why in your
> current setup, the lock keepalive is not triggering for you when the
> connection is dropped because that is a little odd. unless you have
> the org.apache.activemq.store.jdbc.JDBCPersistenceAdapter#setLockKeepAlivePeriod
> = 0.

Is that option configurable in the XML config?

Anyway, thanks, Gary, for a detailed and pertinent response. This has given me a 
few things to try.

Cheers,

Alex.


> 
> 
> On 24 May 2012 11:45, Alex Hooper <ahooper@bmjgroup.com> wrote:
>> Hi,
>>
>> We are running activemq 5.5.1 in an active/passive failover configuration
>> with JDBC Persistence to an Oracle backend. The default strategy for
>> determining whether the current master has failed is for the secondary
>> server to attempt to get a lock on the database table, waiting indefinitely
>> for the lock to be granted.
>>
>> This is not working (at least in our context) as, after a relatively short
>> time in operation (a handful of hours at most) the connection to Oracle is
>> dropped. Activemq doesn't notice this, so the secondary sits there happily
>> waiting for a lock it can now never get and, in the event of a failure,
>> won't serve any clients as it is not a master.
>>
>> Is there some way to change the decision mechanism to, eg, a polling
>> strategy? Or can anyone suggest another resolution to this problem?
>>
>> Alex.
>> --
>> Alex Hooper
>> Operations Team Leader, BMJ Group, BMA House, London WC1H 9JR
>> Tel: +44 (0) 20 7383 6049
>> http://group.bmj.com/
>>
-- 
Alex Hooper
Operations Team Leader, BMJ Group, BMA House, London WC1H 9JR
Tel: +44 (0) 20 7383 6049
http://group.bmj.com/

_______________________________________________________________________
The BMJ Group is one of the world's most trusted providers of medical information for doctors,
researchers, health care workers and patients group.bmj.com.  This email and any attachments
are confidential.  If you have received this email in error, please delete it and kindly notify
us.  If the email contains personal views then the BMJ Group accepts no responsibility for
these statements.  The recipient should check this email and attachments for viruses because
the BMJ Group accepts no liability for any damage caused by viruses.  Emails sent or received
by the BMJ Group may be monitored for size, traffic, distribution and content.  BMJ Publishing
Group Limited trading as BMJ Group.  A private limited company, registered in England and
Wales under registration number 03102371.  Registered office: BMA House, Tavistock Square,
London WC1H 9JR, UK.
_______________________________________________________________________

Mime
View raw message