From issues-return-51652-apmail-geode-issues-archive=geode.apache.org@geode.apache.org Tue Jul 16 23:25:02 2019 Return-Path: X-Original-To: apmail-geode-issues-archive@minotaur.apache.org Delivered-To: apmail-geode-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by minotaur.apache.org (Postfix) with SMTP id 6D7491959E for ; Tue, 16 Jul 2019 23:25:02 +0000 (UTC) Received: (qmail 92144 invoked by uid 500); 16 Jul 2019 23:25:01 -0000 Delivered-To: apmail-geode-issues-archive@geode.apache.org Received: (qmail 92109 invoked by uid 500); 16 Jul 2019 23:25:01 -0000 Mailing-List: contact issues-help@geode.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@geode.apache.org Delivered-To: mailing list issues@geode.apache.org Received: (qmail 92054 invoked by uid 99); 16 Jul 2019 23:25:01 -0000 Received: from mailrelay1-us-west.apache.org (HELO mailrelay1-us-west.apache.org) (209.188.14.139) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 16 Jul 2019 23:25:01 +0000 Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 7BEF6E2E8A for ; Tue, 16 Jul 2019 23:25:00 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 247272658E for ; Tue, 16 Jul 2019 23:25:00 +0000 (UTC) Date: Tue, 16 Jul 2019 23:25:00 +0000 (UTC) From: "Eric Shu (JIRA)" To: issues@geode.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (GEODE-6975) When a redundant copy or replica of a distributed region failed to persistent remote member's new persistence id, it should send reply exception back to indicate what happened MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/GEODE-6975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Shu updated GEODE-6975: ---------------------------- Labels: GeodeCommons (was: ) > When a redundant copy or replica of a distributed region failed to persistent remote member's new persistence id, it should send reply exception back to indicate what happened > ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > > Key: GEODE-6975 > URL: https://issues.apache.org/jira/browse/GEODE-6975 > Project: Geode > Issue Type: Bug > Components: persistence, regions > Affects Versions: 1.1.0 > Reporter: Eric Shu > Assignee: Eric Shu > Priority: Major > Labels: GeodeCommons > > Currently, when a persistent bucket or distributed region is created on member A, member A will send its new PersistentMemberID to other hosts (e.g member B), so that member B will know and persist A's new ID for the region. > However, when member B is being shut down during processing the PrepareNewPersistentMemberMessage (did not persist A's id), it just send a reply message indicate it had persisted. This will cause Member A removes its old member id and only persists its new member id. This is wrong as the member A could also been shut down at the same time. There is a race that member B could be recognized as hosting the last copy for the region. This will lead to member B to recover first, and member B can only recover member A's old persistent id. This will lead to Member A not able to restart, as B does not recognize A's new persistent id. > [error 2018/09/19 01:18:00.972 PDT dataStoregemfire6_host1_6131 tid=0x77] A DiskAccessException has occurred while writing to the disk for region /__PR/_B__partitionedRegion_0. The cache will be closed. > org.apache.geode.cache.persistence.ConflictingPersistentDataException: Region /__PR/_B__partitionedRegion_0 remote member rs-FullRegression19041704a3i3large-hydra-client-62(dataStoregemfire1_host1_5862:5862):1025 with persistent data /10.32.109.230:/var/vcap/data/rundir/concParRegHAPersistPdxVA57H/concParRegHAPersistPdx-0919-011540/vm_1_dataStore1_disk_1 created at timestamp 1537345060760 version 0 diskStoreId a35a937a082b4066-af019365b6a5114b name null was not part of the same distributed system as the local data from /10.32.109.230:/var/vcap/data/rundir/concParRegHAPersistPdxVA57H/concParRegHAPersistPdx-0919-011540/vm_6_dataStore6_disk_1 created at timestamp 1537344996470 version 0 diskStoreId 108be5a03966418f-980c1d88e9b26d1d name null > at org.apache.geode.internal.cache.persistence.PersistenceAdvisorImpl.checkMyStateOnMembers(PersistenceAdvisorImpl.java:521) > at org.apache.geode.internal.cache.persistence.PersistenceInitialImageAdvisor.removeReplicatesIfWeAreEqualToAnyOrElseClearEqualMembers(PersistenceInitialImageAdvisor.java:181) > at org.apache.geode.internal.cache.persistence.PersistenceInitialImageAdvisor.getAdvice(PersistenceInitialImageAdvisor.java:69) > at org.apache.geode.internal.cache.persistence.PersistenceAdvisorImpl.getInitialImageAdvice(PersistenceAdvisorImpl.java:831) > at org.apache.geode.internal.cache.persistence.CreatePersistentRegionProcessor.getInitialImageAdvice(CreatePersistentRegionProcessor.java:52) > at org.apache.geode.internal.cache.DistributedRegion.getInitialImageAndRecovery(DistributedRegion.java:1200) > at org.apache.geode.internal.cache.DistributedRegion.initialize(DistributedRegion.java:1081) > at org.apache.geode.internal.cache.BucketRegion.initialize(BucketRegion.java:258) > at org.apache.geode.internal.cache.LocalRegion.createSubregion(LocalRegion.java:1014) > at org.apache.geode.internal.cache.PartitionedRegionDataStore.createBucketRegion(PartitionedRegionDataStore.java:779) > at org.apache.geode.internal.cache.PartitionedRegionDataStore.grabFreeBucket(PartitionedRegionDataStore.java:454) > at org.apache.geode.internal.cache.PartitionedRegionDataStore.grabBucket(PartitionedRegionDataStore.java:2895) > at org.apache.geode.internal.cache.ProxyBucketRegion.recoverFromDisk(ProxyBucketRegion.java:447) > at org.apache.geode.internal.cache.ProxyBucketRegion.recoverFromDiskRecursively(ProxyBucketRegion.java:390) > at org.apache.geode.internal.cache.PRHARedundancyProvider$4.run2(PRHARedundancyProvider.java:1756) > at org.apache.geode.internal.cache.partitioned.RecoveryRunnable.run(RecoveryRunnable.java:58) > at org.apache.geode.internal.cache.PRHARedundancyProvider$4.run(PRHARedundancyProvider.java:1748) > at java.lang.Thread.run(Thread.java:748) -- This message was sent by Atlassian JIRA (v7.6.14#76016)