geode-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anthony Baker <aba...@pivotal.io>
Subject Re: Question about rolling back a Geode upgrade
Date Wed, 09 Oct 2019 19:50:47 GMT
Hi Alberto!

Another experiment that might be useful to try is changing a p2p message following [1].  If
you follow the steps in the wiki, a rolling upgrade should work ok.  But if you then try to
do a rolling downgrade, what happens?


Anthony

[1] https://cwiki.apache.org/confluence/display/GEODE/Managing+Backward+Compatibility


> On Sep 26, 2019, at 9:37 AM, Alberto Gomez <alberto.gomez@est.tech> wrote:
> 
> Hi again,
> 
> I have been investigating a bit more the possibility of supporting 
> "rolling downgrades" in Geode similar to rolling upgrades and I would 
> like to share my findings and also ask for some help.
> 
> My tests were done upgrading from Geode 1.10 to a recent version in the 
> develop branch and rolling back (downgrading) to 1.10. I was using one 
> locator and two servers. I am sure my findings would have been different 
> if I used other Geode versions or another configuration.
> 
> By doing some changes in code, I managed to rollback the servers but I 
> got into trouble when starting the old locator.
> 
> The changes I did where the following:
> 
> - I removed the check for equality for the local and remote versions of 
> Geode in ConnectCommand::connect() so that it was allowed to connect to 
> Geode with a newer or older version of gfsh.
> - I started the locators and servers with the 
> gemfire.allow_old_members_to_join_for_testing property to allow old 
> members to join a newer Geode system.
> - I changed Version::fromOrdinal method to return CURRENT instead of 
> throwing an exception when the ordinal passed corresponds to a version 
> not supported. I had to do this change in order for old servers to be 
> able to progress when reading oplogs generated by newer servers.
> 
> After downgrading the servers successfully, I stopped the new locator, 
> started the old one (with the old gfsh) and got an exception in the 
> locator when reading from the view file:
> 
> The Locator process terminated unexpectedly with exit status 1. Please 
> refer to the log file in 
> /home/alberto/geode/geode-releases/apache-geode-1.0.0/locator1 for full 
> details.
> 
> Exception in thread "main" org.apache.geode.InternalGemFireException: 
> Unable to recover previous membership view from 
> /home/alberto/geode/geode-releases/apache-geode-1.10.0/locator1/locator10334view.dat
> 
>     at 
> org.apache.geode.distributed.internal.membership.gms.locator.GMSLocator.recoverFromFile(GMSLocator.java:492)
>     ...
>     Caused by: java.io.StreamCorruptedException: invalid type code: 02
> 
>     at 
> java.io.ObjectInputStream$BlockDataInputStream.readBlockHeader(ObjectInputStream.java:2871)
>     ...
> 
> I think the problem is in the deserialization due to the fact that the 
> format of the locator's view file has changed between both Geode 
> versions after GEODE-7090.
> 
> This leads me to think that I might have been successful in the "rolling 
> downgrade" if I had selected other versions of Geode or I might have run 
> into a different set of problems.
> 
> After this research I would like to get some feedback from the community 
> on the following questions:
> 
> - Would it be reasonable to restrict future changes in Geode between 
> minor versions so that the rolling downgrade is supported? This would 
> imply that changes such as the one done in GEODE-7090 would not be 
> allowed for a minor version change.
> 
> - Could the changes in code and configuration I have done in my tests to 
> support the "rolling downgrade" have any negative secondary effects 
> which should dissuade us from using them?
> 
> - Are there any other things I have not taken into account that would 
> require changes in order to support rolling upgrades?
> 
> - Is it even feasible to implement "rolling downgrades" of Geode with 
> some restrictions or there are always possible incompatibilities between 
> versions that make it impossible or unreasonably hard to support this 
> kind of feature?
> 
> Thanks in advance for your help,
> 
> -Alberto G.
> 
> On 23/9/19 17:04, Alberto Gomez wrote:
>> Hi Anthony,
>> 
>> That's an option but, as you say, the cost in infrastructure is high and
>> there are also other problems to solve like how to do the switch between
>> systems and how to assure the data consistency among them.
>> 
>> I was thinking that in many cases it might be possible to support a
>> rolling downgrade similar to the rolling upgrade given that the rolling
>> upgrade already allows the coexistence of old and new members in a cluster.
>> 
>> -Alberto
>> 
>> On 23/9/19 15:55, Anthony Baker wrote:
>>> Have you considered using a blue / green deployment approach?  It provides more
flexibility for these scenarios though the infrastructure cost is high.
>>> 
>>> Anthony
>>> 
>>> 
>>>> On Sep 23, 2019, at 5:59 AM, Alberto Gomez <alberto.gomez@est.tech>
wrote:
>>>> 
>>>> Hi,
>>>> 
>>>> Looking at the Geode documentation I have not found any reference to
>>>> rolling back a Geode upgrade.
>>>> 
>>>> Running some tests, I have observed that once a Geode System has been
>>>> upgraded to a later version, it is not possible to rollback the upgrade
>>>> even if no data modifications have been done after the upgrade.
>>>> 
>>>> The system protects itself in several places: gfsh does not allow you to
>>>> connect to a newer version of Geode, the Oplog files store the version
>>>> of the server which prevents an older server to start from a file from a
>>>> newer server, the cluster also does not allow older members to join a
>>>> cluster with newer members and there are probably other protections I
>>>> did not hit.
>>>> 
>>>> Even if you tamper with some of those protections, you can run into
>>>> trouble due to compatibility issues. I ran into one when I lifted up the
>>>> requirement to have the same gfsh versions using versions 1.8 and 1.10
>>>> because it seems there is some configuration exchanged in Json format
>>>> whose format has changed between those two versions.
>>>> 
>>>> My question is that if it has ever been considered to support rollback
>>>> of Geode upgrades (preferably in rolling mode), at least between systems
>>>> under the same major version. In our experience customers often require
>>>> the rollback of upgrades.
>>>> 
>>>> Thanks in advance for your help,
>>>> 
>>>> -Alberto G.
>>>> 


Mime
View raw message