geode-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Udo Kohlmeyer (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (GEODE-3003) Geode doesn't start after cluster restart when using cluster-configuration
Date Wed, 07 Jun 2017 15:12:18 GMT

    [ https://issues.apache.org/jira/browse/GEODE-3003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16041025#comment-16041025
] 

Udo Kohlmeyer commented on GEODE-3003:
--------------------------------------

[~Neighbour],
The second issue you see is expected behavior. In my previous comment, I mentioned that when
you shutdown a cluster, one needs to restart the servers in the opposite order of shutdown.
In this case when you shutdown host1-locator, host1-server1 and then host2-locator1,host2-server1;
Upon restart it is "expected" the host2-locator1 is started first, because it is the last
locator to have shutdown and potentially has some newer data than what host1-locator1 would
have. That is why you are seeing the "Region /_ConfigurationRegion has potentially stale data.
It is waiting for another member to recover the latest data." log.

Given the GEODE-2238 issue, I consistently end up getting my host1-server1 not restarting
with the "org.apache.geode.GemFireConfigException: cluster configuration service not available"
message.

> Geode doesn't start after cluster restart when using cluster-configuration
> --------------------------------------------------------------------------
>
>                 Key: GEODE-3003
>                 URL: https://issues.apache.org/jira/browse/GEODE-3003
>             Project: Geode
>          Issue Type: Bug
>          Components: configuration, membership
>            Reporter: Anton Mironenko
>            Assignee: Jinmei Liao
>            Priority: Blocker
>         Attachments: 20170522-geode-klyazma.zip, 20170522-geode-vyazma.zip, geode-host1.zip,
geode-host2.zip, readme.txt
>
>
> There is a two-host Geode cluster with locator and server on each host.
> First start of all nodes goes well.
> Then all nodes are gracefully stopped (kill [locator-PID] [server-PID]).
> The second start goes wrong: the locator on the first host always doesn't join the rest
of the cluster with the error in the locator log:
> "Region /_ConfigurationRegion has potentially stale data. It is waiting for another member
to recover the latest data."
> And sometimes (once per 5 starts) some server shuts down just after start with the error

> "org.apache.geode.GemFireConfigException: cluster configuration service not available".
> This bug started appearing only when we moved to Geode 1.1.1. And it totally blocks us.
> On GemFire 8.2.1 there was no such a bug.
> This is very easy to reproduce.
> Test preparation:
> ---------------------
> Here are two attached zip files - "geode-host1.zip" and "geode-host2.zip"
> 1) unzip "geode-host1.zip" into some folder on your first host
> 2) in start-locator.sh change the IPs of locators to the values of your host1 and host2
> "--locators=10.50.3.38[20236],10.50.3.14[20236]"
> 3) in start-server.sh 
> "locators=10.50.3.38[20236],10.50.3.14[20236]" change the IPs of locators to the values
of your host1 and host2
> 4) do the bullets 1)-4) for host2, the folder where you unzip the file should be the
same as on the first host
> Test running:
> ---------------
> 1) rm -rf {locator0,server1}
> 2) run ./start-locator.sh; ./start-server.sh on host1, then on host2. See that this cluster
start is successful.
> 3) kill locator and server processes first on host1, then on host2
> kill [locator-PID] [server-PID]
> 4) run ./start-locator.sh; ./start-server.sh on host1, then on host2
> 5) see that actually there are two clusters: "host1-locator" and "host1-server, host2-locator,
host2-server" instead of one cluster. And sometimes there is no "host1-server", because it
shutdown with error "Region /_ConfigurationRegion has potentially stale data. It is waiting
for another member to recover the latest data.".



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message