db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Knut Anders Hatlen (JIRA)" <j...@apache.org>
Subject [jira] Commented: (DERBY-3909) Race condition in NetXAResource.removeXaresFromSameRMchain()
Date Sun, 12 Oct 2008 14:50:46 GMT

    [ https://issues.apache.org/jira/browse/DERBY-3909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12638851#action_12638851
] 

Knut Anders Hatlen commented on DERBY-3909:
-------------------------------------------

There are more problems with the handling of this data structure.

I wrote a test which had 10 threads that opened and closed XA
connections in parallel, and every 5 seconds it would print how many
operations it had performed. The test showed that the number of
operations per interval decreased over time, and when I instrumented
the code I found that the vector xaResourceSameRMGroup_ in
NetXAResource just kept growing. xaResourceSameRMGroup_ holds the
singly-linked lists that this issue was originally opened for.

Since the vector is meant to have one item per resource manager
(which, as I understand it, is a Derby network server in this
context), and all the threads connect to the same server, the vector
should never have more than one element in this test.

The problem is this:

NetXAResource.initForReuse() scans through the vector to find the
correct group to add the XAResource to. The correct group is one for
which isSameRM() returns true. XAResources that belong to the same
group are chained.

But if the first XAResource in one of the chains belongs to a closed
connection whose XAResource has not yet been removed from the chain
(yet another race condition), an XAException is raised, and
initForReuse() notices it and just continues the search on the next
element in the vector. If the group that was skipped because of the
XAException, was the group that the XAResource actually should have
been added to, the search won't find a matching group, and a new
element will be added to the vector. So we end up with multiple chains
for a single group.

Also, since the XAResource may even exist from before in the skipped
group, it will end up two places in the vector. When the XAConnection
is closed later on, only one of the occurrences of the XAResource will
be removed from the vector, which means that the vector could become
very large over time.

Even worse, the next pointers used to chain NetXAResource objects that
belong to the same group, are not cleared when the object is removed
from the chain or when the NetXAResource is added as the root of a new
group. If it is already part of a group that was skipped because of an
XAException, it could have a next pointer which points to another
chain, and with some bad luck we could end up with a cycle in one of
the chains and make initForReuse() and removeXaresFromSameRMchain()
vulnerable to infinite loops.

It looks like the problem could be solved simply by turning around the
call to NetXAResource.isSameRM() in initForReuse(), from
xaResourceGroup.isSameRM(this) to isSameRM(xaResourceGroup). Since the
connection used by "this" is not closed, the XAException is not
thrown, and the NetXAResource is added to the correct group.

However, I can't see that this data structure is used anywhere in the
code. The fields that build up the structure are only used in
initForReuse() and removeXaresFromSameRMchain(), so we only put things
in and remove them, without doing anything useful with them in
between. Given that the code that handles the data structure has a
number of problems, is fairly complex and doesn't seem to do anything
useful, I would suggest that we remove it instead of trying to fix it.

> Race condition in NetXAResource.removeXaresFromSameRMchain()
> ------------------------------------------------------------
>
>                 Key: DERBY-3909
>                 URL: https://issues.apache.org/jira/browse/DERBY-3909
>             Project: Derby
>          Issue Type: Bug
>          Components: Network Client
>    Affects Versions: 10.5.0.0
>            Reporter: Knut Anders Hatlen
>            Assignee: Knut Anders Hatlen
>         Attachments: d3909.diff
>
>
> NetXAResource.removeXaresFromSameRMchain() does the following to remove a NetXAResource
from what's logically a singly-linked list:
> 1) Mark the NetXAResource to remove with a flag (a field called ignoreMe_)
> 2) Synchronize on an object that protects the linked list
> 3) Follow the chain of next pointers in the linked list and remove the first flagged
object
> 4) Release synchronization lock obtained in (2)
> 5) Clear the flag set in (1)
> Now, say that two threads (T1 and T2) perform step 1 in parallel. T1 is granted the synchronization
lock in (2), and T2 must wait. T1 traverses the linked list, finds the object flagged by T2
and removes it. Further T1 releases the synchronization lock and clears the flag on the object
it had flagged. This is not the same object that it removed, so when T2 is granted the synchronization
lock, there is no object flagged for removal. As a result, only the object T2 attempted to
remove was in fact removed. The object that T1 flagged for removal is still in the linked
list.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message