ambari-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dmitry Lysnichenko (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (AMBARI-18786) HDP Upgrade fails when the cluster size is large
Date Thu, 03 Nov 2016 14:47:58 GMT

     [ https://issues.apache.org/jira/browse/AMBARI-18786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Dmitry Lysnichenko updated AMBARI-18786:
----------------------------------------
    Component/s: ambari-server

> HDP Upgrade fails when the cluster size is large
> ------------------------------------------------
>
>                 Key: AMBARI-18786
>                 URL: https://issues.apache.org/jira/browse/AMBARI-18786
>             Project: Ambari
>          Issue Type: Bug
>          Components: ambari-server
>            Reporter: Dmitry Lysnichenko
>            Assignee: Dmitry Lysnichenko
>         Attachments: AMBARI-18786.patch
>
>
> Starting from Ambari 2.4, when the cluster is large, HDP upgrade fails during namenode
restart.
> This is because, restart command waits for namenode to come out of safemode and if the
cluster size is large, namenode takes more time to leave safemode but Ambari marks this action
as failure as the namenode didn't leave safemode within the configured timeout in Ambari scripts.
> {code}
> Traceback (most recent call last):
> File "/usr/lib/python2.6/site-packages/resource_management/libraries/functions/jmx.py",
line 42, in get_value_from_jmx
> return data_dict["beans"][0][property]
> IndexError: list index out of range
> Traceback (most recent call last):
> File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/namenode.py",
line 420, in <module>
> NameNode().execute()
> File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py",
line 280, in execute
> method(env)
> File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py",
line 720, in restart
> self.start(env, upgrade_type=upgrade_type)
> File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/namenode.py",
line 101, in start
> upgrade_suspended=params.upgrade_suspended, env=env)
> File "/usr/lib/python2.6/site-packages/ambari_commons/os_family_impl.py", line 89, in
thunk
> return fn(*args, **kwargs)
> File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/hdfs_namenode.py",
line 184, in namenode
> if is_this_namenode_active() is False:
> File "/usr/lib/python2.6/site-packages/resource_management/libraries/functions/decorator.py",
line 55, in wrapper
> return function(*args, **kwargs)
> File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/hdfs_namenode.py",
line 554, in is_this_namenode_active
> raise Fail(format("The NameNode {namenode_id} is not listed as Active or Standby, waiting..."))
> resource_management.core.exceptions.Fail: The NameNode nn1 is not listed as Active or
Standby, waiting...
> {code}
> To resolve this, we increased the timeout for ambari
> 1. Increased the timeout in /var/lib/ambari-server/resources/common-services/HDFS/2.1.0.2.0/package/scripts/hdfs_namenode.py
from this;
> @retry(times=5, sleep_time=5, backoff_factor=2, err_class=Fail)
> to this;
> @retry(times=25, sleep_time=25, backoff_factor=2, err_class=Fail)
> 2. Restart Ambari server
> After this upgrade went through fine.
> I think its better to increase the timeout permanently so that we don't have to deal
with this issue again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message