db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Knut Anders Hatlen (JIRA)" <j...@apache.org>
Subject [jira] Updated: (DERBY-2991) Index split deadlock
Date Mon, 09 Mar 2009 15:02:50 GMT

     [ https://issues.apache.org/jira/browse/DERBY-2991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Knut Anders Hatlen updated DERBY-2991:
--------------------------------------

    Attachment: d2991-2b.stat
                d2991-2b.diff

Here's an updated patch (d2991-2b.diff) which addresses the two issues I
mentioned that I was aware of in the 2a patch:

1) Call Page.setRepositionNeeded() in BTreePostCommit.purgeCommittedDeletes()
when a row has been purged.

2) Handle the cases where reposition() can return false (that is, second
argument to reposition() is false and the row on the current position has been
purged). This led to the following changes:

* BTreeScan.positionAtDoneScanFromClose()
* BTreeScan.reopenScan()

  Removed the calls to reposition(). The only reason I could see for these
  methods to call reposition() was that some implementations of
  BTreeLockingPolicy.unlockScanRecordAfterRead() had asserts that checked that
  the page of the current position was latched. Removing the calls (and the
  asserts) made the code simpler and removed the need for special handling if
  reposition() was unsuccessful.

* B2IRowLockingRR.unlockScanRecordAfterRead()
* B2IRowLocking2.unlockScanRecordAfterRead()

  Don't assert that the current leaf is latched, as there is no need for that
  latch in order to unlock the record. (See above.)

* BTreeScan.delete()
* BTreeScan.doesCurrentPositionQualify()
* BTreeScan.fetch()
* BTreeScan.isCurrentPositionDeleted()

  Make sure that we don't try to release the latch on the current leaf unless
  we have actually latched it, since the leaf won't be latched if reposition()
  returns false. No other special handling of purged rows is needed in those
  methods, I think. delete() and fetch() throw an exception
  (AM_RECORD_NOT_FOUND) if the row has been purged, which sounds reasonable to
  me. doesCurrentPositionQualify() and isCurrentPositionDeleted() use the
  return value from reposition() to decide what they should return themselves,
  which also sounds fine to me (except that I would expect that
  isCurrentPositionDeleted() returned true if the row was purged, but currently
  it returns false -- will file a separate bug for that).

* BTreeMaxScan.fetchMaxRowFromBeginning()
* BTreeForwardScan.fetchRows()

  If the row on the current position of the scan has been purged while we were
  waiting for a lock so that reposition(pos,false) returns false, we call
  reposition() again with second argument true to reposition on the row
  immediately to the left of where the purged row was supposed to be. This
  effectively takes one step back in the scan, so therefore we need to jump to
  the top of the loop's body to move one step forward past the purged row.

I tested that reposition(pos,false) followed by reposition(pos,true) worked by
setting a breakpoint in the debugger and manually changing values in the page
object and in the position to make the scan code believe that the row had been
purged. As far as I could tell, it worked just as if the scan had found a
deleted row. (There are currently no tests that exercise code paths where
reposition() returns false, and I don't see any easy way to write a test for it
since it would be highly dependent on timing between user threads and service
threads.)

This patch fixes all the issues I'm aware of in the previous patch. Derbyall
and suites.All ran cleanly. Reviews, comments and questions would be
appreciated. Thanks.

> Index split deadlock
> --------------------
>
>                 Key: DERBY-2991
>                 URL: https://issues.apache.org/jira/browse/DERBY-2991
>             Project: Derby
>          Issue Type: Bug
>          Components: Store
>    Affects Versions: 10.2.2.0, 10.3.1.4
>         Environment: Windows XP, Java 6
>            Reporter: Bogdan Calmac
>            Assignee: Knut Anders Hatlen
>         Attachments: d2991-2a.diff, d2991-2a.stat, d2991-2b.diff, d2991-2b.stat, d2991-preview-1a.diff,
d2991-preview-1a.stat, d2991-preview-1b.diff, d2991-preview-1b.stat, d2991-preview-1c.diff,
d2991-preview-1c.stat, d2991-preview-1d.diff, d2991-preview-1d.stat, d2991-preview-1e.diff,
derby.log, InsertSelectDeadlock.java, perftest.diff, Repro2991.java, stacktraces_during_deadlock.txt,
test-1.diff, test-2.diff, test-3.diff
>
>
> After doing dome research on the mailing list, it appears that the index split deadlock
is a known behaviour, so I will start by describing the theoretical problem first and then
follow with the details of my test case.
> If you have concurrent select and insert transactions on the same table, the observed
locking behaviour is as follows:
>  - the select transaction acquires an S lock on the root block of the index and then
waits for an S lock on some uncommitted row of the insert transaction
>  - the insert transaction acquires X locks on the inserted records and if it needs to
do an index split creates a sub-transaction that tries to acquire an X lock on the root block
of the index
> In summary: INDEX LOCK followed by ROW LOCK + ROW LOCK followed by INDEX LOCK = deadlock
> In the case of my project this is an important issue (lack of concurrency after being
forced to use table level locking) and I would like to contribute to the project and fix this
issue (if possible). I was wondering if someone that knows the code can give me a few pointers
on the implications of this issue:
>  - Is this a limitation of the top-down algorithm used?
>  - Would fixing it require to use a bottom up algorithm for better concurrency (which
is certainly non trivial)?
>  - Trying to break the circular locking above, I would first question why does the select
transaction need to acquire (and hold) a lock on the root block of the index. Would it be
possible to ensure the consistency of the select without locking the index?
> -----
> The attached test (InsertSelectDeadlock.java) tries to simulate a typical data collection
application, it consists of: 
>  - an insert thread that inserts records in batch 
>  - a select thread that 'processes' the records inserted by the other thread: 'select
* from table where id > ?' 
> The derby log provides detail about the deadlock trace and stacktraces_during_deadlock.txt
shows that the inser thread is doing an index split.
> The test was run on 10.2.2.0 and 10.3.1.4 with identical behaviour.
> Thanks,
> Bogdan Calmac.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message