ambari-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Robertson <andyrobertson...@gmail.com>
Subject Ambari rolling upgrade usage feedback / finalize question
Date Mon, 23 Nov 2015 07:32:18 GMT
I performed a rolling upgrade of HDP from 2.2.8 to 2.2.9 today using
Ambari 2.1.2.1 & ran into several issues.

My YARN resource manager failed to start due to a "Service
ResourceManager failed in state INITED; cause:
java.lang.IllegalArgumentException: Illegal capacity of -1.0 for
node-label=default in queue=root, valid capacity should in range of
[0, 100].". (It was working fine with 2.2.8; this may be something new
in 2.2.9).

As Ambari usage feedback - this was impossible to fix in Ambari while
the upgrade was going on, and it added a ton of (down)time to the
upgrade. This error caused a number of service checks to time out
after a long wait (many checks took 5-15 min to fail). I didn't see
any way to fix the error (the only options I had during the upgrade
were "Downgrade" - which I didn't want to do (It was a test cluster
after all, I wanted to get through it so I could fix it); and "Ignore"
which allowed it to continue, but caused each step to take 300+
seconds. Ambari seemed to lock the configs so I couldn't make changes
to fix the issue while the upgrade was going on.  Likewise, I couldn't
manually restart the service myself or abort the service checks. Even
at the "Verify operation" and the "finalize" checkpoints, where I
could "pause" the upgrade - the configs were still locked and I had no
ability to start/stop services.

At the end, Ambari started giving other errors about being unable to
finalize the upgrade. I ended up rebooting the cluster & ambari - this
got it back to a state where I could edit the configs again to fix the
YARN RM config.  The fix to the RM not starting ended up being the
same as AMBARI-11358, which appears to only have been fixed in the
HDP2.3 upgrade.

Separately, Ambari had the 2.2.9 version waiting to be finalized but I
couldn't find any way to do this in the UI after the restart. So I
went into the database and ran the following:
UPDATE host_version SET state = 'INSTALLED' WHERE state = 'CURRENT';
UPDATE host_version SET state = 'CURRENT' WHERE repo_version_id = <id
for 2.2.9.0 version> and state = 'UPGRADED';
UPDATE cluster_version SET state = 'INSTALLED' WHERE state = 'CURRENT';
UPDATE cluster_version SET state = 'CURRENT' WHERE repo_version_id =
<id for 2.2.9.0 version> and state = 'UPGRADED';
UPDATE hostcomponentstate set upgrade_state = 'NONE';
This seems to have fixed that.

Possibly unrelated - I did find there are 2 services that show up with
an even older old version when checking the ambari database:

ambari=> SELECT h.host_name, hcs.service_name, hcs.component_name,
hcs.version FROM hostcomponentstate hcs JOIN hosts h ON hcs.host_id =
h.host_id where hcs.version NOT IN ('2.2.9.0-3393', 'UNKNOWN');
            host_name             | service_name | component_name |   version
----------------------------------+--------------+----------------+--------------
node2 | HDFS         | ZKFC           | 2.2.6.0-2800
node1 | HDFS         | ZKFC           | 2.2.6.0-2800

(But I had upgraded from 2.2.8; 2.2.6 was the version before that).

Any suggestions on how to fix this? I think Ambari may just be
confused, but I'm not sure how to verify this and/or fix Ambari (other
than overwrite this field in the database?). I've verified the yum
versions are right for the package and the right processes are
actually running on the machine.

Thank you!

Mime
View raw message