geode-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dan Smith (JIRA)" <>
Subject [jira] [Reopened] (GEODE-4928) DistributedLockService doesn't work as expected while the dlock grantor is initialized
Date Thu, 02 Aug 2018 18:46:00 GMT


Dan Smith reopened GEODE-4928:
      Assignee:     (was: Bruce Schuchardt)

The fix attempted for this doesn't quite work, and here's why:

When a member/client leaves, the change causes both the transaction lock and DLock to clean
up whatever locks that member had. DLocks clean up rather quickly, whereas transaction lock
cleanup happens asynchronously in the background. The fix was to have the transaction lock
grantor set some state letting the DLock grantor know that the transaction lock is cleaning
up locks, and have the DLock grantor wait. However, the DLock grantor can get beyond that
check without the Transaction grantor having notified, meaning that the window is reduced
but not closed.

In addition to the above problem, if the lock grantors are actually different members, making
the dlock grantor wait on the transaction lock grantor in the same process doesn't help.

The test written for this issue is demonstrating these issues sporadically, see GEODE-5470.

I don't think we ever really designed these two services to work together that way, so I think
this would be a feature to be implemented, not a bug to be fixed. Maybe we could have DLock
and Transaction locks synchronize on a view number, rather than a boolean flag, but that requires
new messages and significant reworking of how views are processed. Maybe we could mark certain
DLock services as having to be synchronized with transaction locks?

> DistributedLockService doesn't work as expected while the dlock grantor is initialized
> --------------------------------------------------------------------------------------
>                 Key: GEODE-4928
>                 URL:
>             Project: Geode
>          Issue Type: Bug
>          Components: distributed lock service
>            Reporter: Bruce Schuchardt
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.6.0
>          Time Spent: 2.5h
>  Remaining Estimate: 0h
> I wrote a function that obtained a dlock and then performed a transaction.  It always
operates on the same dlock key and the same keys in my region.  That protects against getting
a commit conflict exception BUT this sometimes fails if the JVM holding the lock crashes. 
One of the functions appears to get the dlock okay but then its transaction fails when it
goes to commit.

This message was sent by Atlassian JIRA

View raw message