ambari-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alejandro Fernandez <afernan...@hortonworks.com>
Subject Re: Ambari rolling upgrade usage feedback / finalize question
Date Tue, 24 Nov 2015 20:49:50 GMT
We don't have a process to uninstall old bits, and this area is largely
undocumented and untested.
You can try erasing the 2_2_6_* packages (via yum, zipper, apt).

Thanks,
Alejandro

On 11/23/15, 10:19 PM, "Andrew Robertson" <andyrobertson101@gmail.com>
wrote:

>I've created AMBARI-14031 & AMBARI-14032 for these issues.
>
>And thanks for the pointer for the #experimental /
>opsDuringRollingUpgrade workaround.
>
>One more question - is there a process for uninstalling old / unused
>versions of HDP?  For example, now that I've upgraded from 2.2.8 ->
>2.2.9, is there a way to remove 2.2.6?
>
>On Mon, Nov 23, 2015 at 11:35 AM, Alejandro Fernandez
><afernandez@hortonworks.com> wrote:
>> Hi,
>>
>> I wish your experience with Rolling Upgrades would have been better.
>> I'll do my best to explain the solution to each one of those items. As a
>> developer, I like to hear this feedback so we can make the product
>>better.
>>
>> * Cluster is locked down while in the middle of upgrade:
>> Operations like changing configs, adding hosts, adding services, etc.
>>are
>> disallowed by default.
>> This is meant to prevent the user from drastically changing the stack
>> configs and ending up in a worse state.
>> Cluster operators can still change configs by navigating to
>> http://server:8080/#/experimental and enabling
>>"opsDuringRollingUpgrade".
>> I completely agree that we need to be more flexible in this area since
>> configs are likely to break, and the savvy users should still be
>>allowed to
>> change them.
>>
>> * Configs are only changed in major stack versions:
>> In HDP 2.2.*->2.2.*, we don't expect any config changes, so the Upgrade
>>Pack
>> doesn't orchestrate any, whereas a 2.2.*->2.3.* has many config changes.
>> At times, this will break, and we typically find out about it during
>>testing
>> and reports from users with custom configs.
>> Tools like SmartSense can also help to point out incorrect configs. In
>>the
>> future, we may relax this so that even minor versions are allowed to
>>change
>> configs.
>>
>> * Unable to finalize since hosts are not on the new version:
>> We've talked about a way to "force finalize" the versions. Today,
>>Ambari is
>> very strict about requiring all hosts to be updated.
>> As a workaround, we have a python script called "RU Magician" that will
>> allow you to fix things, and force any version to CURRENT; checkout
>> https://github.com/apache/ambari/tree/branch-2.1.2/contrib/ru_magician
>> You ran the correct SQL statements, so kudos to you for that.
>>
>> * Components that don't advertise a version:
>> Some components like ZKFC, AMS, MySQL, Kerberos Client, donĀ¹t need to
>> advertise a version.
>> In the case of ZKFC, it is because it uses the same binary as that of
>> NameNode. So perhaps an earlier version of Ambari caused it to stay
>>stuck on
>> 2.2.6 in the DB.
>> If you feel more comfortable, you can change ZKFC's version to
>>'UNKNOWN'.
>>
>> My suggestion is to create Jiras on Apache for the following:
>>
>> Allow force finalizing a version during Stack Upgrade
>> Allow changing configs during the middle of a Stack Upgrade, will need
>>to
>> prompt user with a disclaimer/warning
>>
>> Thanks,
>> Alejandro
>>
>> On 11/22/15, 11:32 PM, "Andrew Robertson" <andyrobertson101@gmail.com>
>> wrote:
>>
>> I performed a rolling upgrade of HDP from 2.2.8 to 2.2.9 today using
>> Ambari 2.1.2.1 & ran into several issues.
>>
>> My YARN resource manager failed to start due to a "Service
>> ResourceManager failed in state INITED; cause:
>> java.lang.IllegalArgumentException: Illegal capacity of -1.0 for
>> node-label=default in queue=root, valid capacity should in range of
>> [0, 100].". (It was working fine with 2.2.8; this may be something new
>> in 2.2.9).
>>
>> As Ambari usage feedback - this was impossible to fix in Ambari while
>> the upgrade was going on, and it added a ton of (down)time to the
>> upgrade. This error caused a number of service checks to time out
>> after a long wait (many checks took 5-15 min to fail). I didn't see
>> any way to fix the error (the only options I had during the upgrade
>> were "Downgrade" - which I didn't want to do (It was a test cluster
>> after all, I wanted to get through it so I could fix it); and "Ignore"
>> which allowed it to continue, but caused each step to take 300+
>> seconds. Ambari seemed to lock the configs so I couldn't make changes
>> to fix the issue while the upgrade was going on.  Likewise, I couldn't
>> manually restart the service myself or abort the service checks. Even
>> at the "Verify operation" and the "finalize" checkpoints, where I
>> could "pause" the upgrade - the configs were still locked and I had no
>> ability to start/stop services.
>>
>> At the end, Ambari started giving other errors about being unable to
>> finalize the upgrade. I ended up rebooting the cluster & ambari - this
>> got it back to a state where I could edit the configs again to fix the
>> YARN RM config.  The fix to the RM not starting ended up being the
>> same as AMBARI-11358, which appears to only have been fixed in the
>> HDP2.3 upgrade.
>>
>> Separately, Ambari had the 2.2.9 version waiting to be finalized but I
>> couldn't find any way to do this in the UI after the restart. So I
>> went into the database and ran the following:
>> UPDATE host_version SET state = 'INSTALLED' WHERE state = 'CURRENT';
>> UPDATE host_version SET state = 'CURRENT' WHERE repo_version_id = <id
>> for 2.2.9.0 version> and state = 'UPGRADED';
>> UPDATE cluster_version SET state = 'INSTALLED' WHERE state = 'CURRENT';
>> UPDATE cluster_version SET state = 'CURRENT' WHERE repo_version_id =
>> <id for 2.2.9.0 version> and state = 'UPGRADED';
>> UPDATE hostcomponentstate set upgrade_state = 'NONE';
>> This seems to have fixed that.
>>
>> Possibly unrelated - I did find there are 2 services that show up with
>> an even older old version when checking the ambari database:
>>
>> ambari=> SELECT h.host_name, hcs.service_name, hcs.component_name,
>> hcs.version FROM hostcomponentstate hcs JOIN hosts h ON hcs.host_id =
>> h.host_id where hcs.version NOT IN ('2.2.9.0-3393', 'UNKNOWN');
>>             host_name             | service_name | component_name |
>> version
>> 
>>----------------------------------+--------------+----------------+------
>>--------
>> node2 | HDFS         | ZKFC           | 2.2.6.0-2800
>> node1 | HDFS         | ZKFC           | 2.2.6.0-2800
>>
>> (But I had upgraded from 2.2.8; 2.2.6 was the version before that).
>>
>> Any suggestions on how to fix this? I think Ambari may just be
>> confused, but I'm not sure how to verify this and/or fix Ambari (other
>> than overwrite this field in the database?). I've verified the yum
>> versions are right for the package and the right processes are
>> actually running on the machine.
>>
>> Thank you!
>>
>>
>


Mime
View raw message