geode-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF subversion and git services (JIRA)" <>
Subject [jira] [Commented] (GEODE-1885) Missing subsctiption event with Offheap partitioned region during bucket rebalance.
Date Tue, 27 Sep 2016 23:36:20 GMT


ASF subversion and git services commented on GEODE-1885:

Commit 55a65840a4e4d427acaed1182aca869bf92ecae6 in incubator-geode's branch refs/heads/feature/GEODE-1801
from [~dschneider]
[;h=55a6584 ]

GEODE-1885: fix infinite loop

The previous fix for GEODE-1885 introduced a hang on off-heap regions.
If a concurrent close/destroy of the region happens while other threads
are modifying it then the thread doing the modification can get stuck
in a hot loop that never terminates.
The hot loop is in AbstractRegionMap when it tests the existing
region entry it finds to see if it can be modified.
If the region entry has a value that says it is removed
then the operation spins around and tries again.
It expects the thread that marked it as being removed
to also remove it from the map.
The previous fix for GEODE-1885 can cause a remove to not happen.
So this fix does two things:
 1. On retry remove the existing removed region entry from the map.
 2. putEntryIfAbsent now only releases the current entry if it has an off-heap reference.
    This prevents an infinite loop that was caused by the current thread who just added
    a new entry with REMOVE_PHASE1 from releasing it (changing it to REMOVE_PHASE2)
    because it sees that the region is closed/destroyed.

> Missing subsctiption event with Offheap partitioned region during bucket rebalance.
> -----------------------------------------------------------------------------------
>                 Key: GEODE-1885
>                 URL:
>             Project: Geode
>          Issue Type: Bug
>          Components: offheap
>            Reporter: Anilkumar Gingade
>            Assignee: Darrel Schneider
>             Fix For: 1.0.0-incubating
> During transaction operation, if there is concurrent redundant bucket re-balance is in
progress, the client can miss a subscription event, if its primary queue is hosted on the
node where bucket gets moved from.
> Consider, three node cluster N1, N2 and N3. With:
> - Client C1 connected to node N2.
> - Primary bucket region B1 on N1. And secondary bucket for B1 on N2.
> - A Transaction is started on N2, which creates a entry on B1.
> - When the TX is committed. At the same time the Bucket B1 on N2 is moved to N3.
> - The Tx commit message from N1 is sent to N2. This also includes the subscription message,
satisfying the client C1.
> - On N2, for offheap region, when bucket is not found locally, the exception response
is sent to back to N1 without processing the subscription message.

This message was sent by Atlassian JIRA

View raw message