river-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patricia Shanahan <p...@acm.org>
Subject Re: Bug fixing
Date Tue, 26 Oct 2010 21:37:19 GMT
I've confirmed the temporary fix discussed below with the trunk code, 
having reverted the rest of the changes I made to simplify the problem 
and extract more data. I'm now rerunning the javaspaces category, 
because I think several failures have the same general flavor and may be 
fixed by this one change.


On 10/26/2010 1:22 PM, Patricia Shanahan wrote:
> On 10/26/2010 8:06 AM, Patricia Shanahan wrote:
>> Greg Trasuk wrote:
>>> The transaction spec says (under "Joining a Transaction") that "The join
>>> method throws CannotJoinException if the transaction is known to the
>>> manager but is no longer active."
>>> So it would appear that this behaviour is a bug, if the transaction's
>>> lease has actually expired.
>>> I had a quick look through TxnManagerImpl and I don't see any
>>> "auto-renew-on-join" behaviour, so that seems kind of odd. In fact, in
>>> the join(...) method, it finds a TxnManagerTransaction instance, then
>>> calls join(..) on it, which appears to check whether the transaction is
>>> expired, and should throw CannotJoinException if it is expired.
>> There is an alternative I need to check. I want see if the JavaSpace is
>> caching the transaction information. Maybe it isn't even checking with
>> the TransactionManager if it thinks it has already joined the
>> transaction, regardless of the length of the transaction's lease.
> My guess was correct! (This is the first time I've found a River bug
> through an intuitive leap, rather than step-by-step brute force debug.)
> The method enterTxn in OutriggerServerImpl does the dirty work. It uses
> a table that maps TransactionManager-id pairs to its internal
> transaction representation, com.sun.jini.outrigger.Txn.
> It begins by doing a look-up on the table which will either get a Txn or
> null. That is followed by this comment:
> "If we find the txn with the probe, we're all done. Just return.
> Otherwise, we need to join the transaction. Of course, there is a race
> condition here. While we are attempting to join, others might be
> attempting to join also. We are careful to ensure that only one of the
> racing threads is able to update our internal data structures with our
> internal representation of this transaction. NB: There are better ways
> of doing this which could ensure we never make an unnecessary remote
> call that we may want to explore later."
> if (txn == null) {
> final TransactionManager mgr = (TransactionManager)
> transactionManagerPreparer.prepareProxy(tr.mgr);
> tr = new ServerTransaction(mgr, tr.id);
> tr.join(participantProxy, crashCount);
> txn = txnTable.put(tr);
> }
> Commenting out the "if (txn == null)" test, so that the join is done
> regardless, makes the test pass.
> My next problem is how to really fix this. Presumably, the caching was
> not done for fun. There is at least some chance that it is a fix for a
> real performance problem, and merely commenting out the test would bring
> back that problem. On the other hand, I'm not sure of all the
> consequences, including effects on recovery, of going on blindly using a
> transaction id after the transaction has really been discarded by its
> manager.
> One approach would be to keep the caching, but try to make it correct.
> Is there any way to find out the scheduled lease termination of an
> existing transaction? If there were, OutriggerSeverImpl could store it
> in the Txn object, and re-do the join if time now is greater.
> On the other hand, I rather feel that if the state of a transaction can
> be safely cached, with a lease-based time limit, it should be done by
> the TransactionManager's proxy. It is more tightly coupled to the
> TransactionManager implementation, and any performance benefit would
> apply to all TransactionManager uses, not just the outrigger JavaSpace
> implementation.
> Thoughts? Suggestions?
> Patricia

View raw message