geode-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF subversion and git services (JIRA)" <>
Subject [jira] [Commented] (GEODE-6975) When a redundant copy or replica of a distributed region failed to persistent remote member's new persistence id, it should send reply exception back to indicate what happened
Date Tue, 16 Jul 2019 23:43:00 GMT


ASF subversion and git services commented on GEODE-6975:

Commit a4f98c78dcd1f624afa5745ee65299d40e080df5 in geode's branch refs/heads/feature/GEODE-6975
from eshu
[;h=a4f98c7 ]

GEODE-6975: Reply with exception if fail to persit sender's new id.

 * Reply with ReplyException when processing PrepareNewPersistentMemberMessage
 * Retry again if remote member failed to persit the new persistent member id.
 * Make sure atomicCreation flag is reset, so that redundancy could be satisfied
   during bucket creation retry if previous attempt failed.

> When a redundant copy or replica of a distributed region failed to persistent remote
member's new persistence id, it should send reply exception back to indicate what happened
> -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>                 Key: GEODE-6975
>                 URL:
>             Project: Geode
>          Issue Type: Bug
>          Components: persistence, regions
>    Affects Versions: 1.1.0
>            Reporter: Eric Shu
>            Assignee: Eric Shu
>            Priority: Major
>              Labels: GeodeCommons
> Currently, when a persistent bucket or distributed region is created on member A, member
A will send its new PersistentMemberID to other hosts (e.g member B), so that member B will
know and persist A's new ID for the region. 
> However, when member B is being shut down during processing the PrepareNewPersistentMemberMessage
(did not persist A's id), it just send a reply message indicate it had persisted. This will
cause Member A removes its old member id and only persists its new member id. This is wrong
as the member A could also been shut down at the same time. There is a race that member B
could be recognized as hosting the last copy for the region. This will lead to member B to
recover first, and member B can only recover member A's old persistent id. This will lead
to Member A not able to restart, as B does not recognize A's new persistent id.
> [error 2018/09/19 01:18:00.972 PDT dataStoregemfire6_host1_6131 <Recovery thread for
bucket _B__partitionedRegion_0> tid=0x77] A DiskAccessException has occurred while writing
to the disk for region /__PR/_B__partitionedRegion_0. The cache will be closed.
> org.apache.geode.cache.persistence.ConflictingPersistentDataException: Region /__PR/_B__partitionedRegion_0
remote member rs-FullRegression19041704a3i3large-hydra-client-62(dataStoregemfire1_host1_5862:5862)<ec><v8>:1025
with persistent data /
created at timestamp 1537345060760 version 0 diskStoreId a35a937a082b4066-af019365b6a5114b
name null was not part of the same distributed system as the local data from /
created at timestamp 1537344996470 version 0 diskStoreId 108be5a03966418f-980c1d88e9b26d1d
name null
>         at org.apache.geode.internal.cache.persistence.PersistenceAdvisorImpl.checkMyStateOnMembers(
>         at org.apache.geode.internal.cache.persistence.PersistenceInitialImageAdvisor.removeReplicatesIfWeAreEqualToAnyOrElseClearEqualMembers(
>         at org.apache.geode.internal.cache.persistence.PersistenceInitialImageAdvisor.getAdvice(
>         at org.apache.geode.internal.cache.persistence.PersistenceAdvisorImpl.getInitialImageAdvice(
>         at org.apache.geode.internal.cache.persistence.CreatePersistentRegionProcessor.getInitialImageAdvice(
>         at org.apache.geode.internal.cache.DistributedRegion.getInitialImageAndRecovery(
>         at org.apache.geode.internal.cache.DistributedRegion.initialize(
>         at org.apache.geode.internal.cache.BucketRegion.initialize(
>         at org.apache.geode.internal.cache.LocalRegion.createSubregion(
>         at org.apache.geode.internal.cache.PartitionedRegionDataStore.createBucketRegion(
>         at org.apache.geode.internal.cache.PartitionedRegionDataStore.grabFreeBucket(
>         at org.apache.geode.internal.cache.PartitionedRegionDataStore.grabBucket(
>         at org.apache.geode.internal.cache.ProxyBucketRegion.recoverFromDisk(
>         at org.apache.geode.internal.cache.ProxyBucketRegion.recoverFromDiskRecursively(
>         at org.apache.geode.internal.cache.PRHARedundancyProvider$4.run2(
>         at
>         at org.apache.geode.internal.cache.PRHARedundancyProvider$
>         at

This message was sent by Atlassian JIRA

View raw message