river-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Firmstone <j...@zeus.net.au>
Subject Re: Bug fixing
Date Thu, 28 Oct 2010 07:49:26 GMT
Hmm, most admirable.



Patricia Shanahan wrote:
> I've confirmed the temporary fix discussed below with the trunk code, 
> having reverted the rest of the changes I made to simplify the problem 
> and extract more data. I'm now rerunning the javaspaces category, 
> because I think several failures have the same general flavor and may 
> be fixed by this one change.
> Patricia
> On 10/26/2010 1:22 PM, Patricia Shanahan wrote:
>> On 10/26/2010 8:06 AM, Patricia Shanahan wrote:
>>> Greg Trasuk wrote:
>>>> The transaction spec says (under "Joining a Transaction") that "The 
>>>> join
>>>> method throws CannotJoinException if the transaction is known to the
>>>> manager but is no longer active."
>>>> So it would appear that this behaviour is a bug, if the transaction's
>>>> lease has actually expired.
>>>> I had a quick look through TxnManagerImpl and I don't see any
>>>> "auto-renew-on-join" behaviour, so that seems kind of odd. In fact, in
>>>> the join(...) method, it finds a TxnManagerTransaction instance, then
>>>> calls join(..) on it, which appears to check whether the 
>>>> transaction is
>>>> expired, and should throw CannotJoinException if it is expired.
>>> There is an alternative I need to check. I want see if the JavaSpace is
>>> caching the transaction information. Maybe it isn't even checking with
>>> the TransactionManager if it thinks it has already joined the
>>> transaction, regardless of the length of the transaction's lease.
>> My guess was correct! (This is the first time I've found a River bug
>> through an intuitive leap, rather than step-by-step brute force debug.)
>> The method enterTxn in OutriggerServerImpl does the dirty work. It uses
>> a table that maps TransactionManager-id pairs to its internal
>> transaction representation, com.sun.jini.outrigger.Txn.
>> It begins by doing a look-up on the table which will either get a Txn or
>> null. That is followed by this comment:
>> "If we find the txn with the probe, we're all done. Just return.
>> Otherwise, we need to join the transaction. Of course, there is a race
>> condition here. While we are attempting to join, others might be
>> attempting to join also. We are careful to ensure that only one of the
>> racing threads is able to update our internal data structures with our
>> internal representation of this transaction. NB: There are better ways
>> of doing this which could ensure we never make an unnecessary remote
>> call that we may want to explore later."
>> if (txn == null) {
>> final TransactionManager mgr = (TransactionManager)
>> transactionManagerPreparer.prepareProxy(tr.mgr);
>> tr = new ServerTransaction(mgr, tr.id);
>> tr.join(participantProxy, crashCount);
>> txn = txnTable.put(tr);
>> }
>> Commenting out the "if (txn == null)" test, so that the join is done
>> regardless, makes the test pass.
>> My next problem is how to really fix this. Presumably, the caching was
>> not done for fun. There is at least some chance that it is a fix for a
>> real performance problem, and merely commenting out the test would bring
>> back that problem. On the other hand, I'm not sure of all the
>> consequences, including effects on recovery, of going on blindly using a
>> transaction id after the transaction has really been discarded by its
>> manager.
>> One approach would be to keep the caching, but try to make it correct.
>> Is there any way to find out the scheduled lease termination of an
>> existing transaction? If there were, OutriggerSeverImpl could store it
>> in the Txn object, and re-do the join if time now is greater.
>> On the other hand, I rather feel that if the state of a transaction can
>> be safely cached, with a lease-based time limit, it should be done by
>> the TransactionManager's proxy. It is more tightly coupled to the
>> TransactionManager implementation, and any performance benefit would
>> apply to all TransactionManager uses, not just the outrigger JavaSpace
>> implementation.
>> Thoughts? Suggestions?
>> Patricia

View raw message