ambari-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nate Cole (JIRA)" <>
Subject [jira] [Created] (AMBARI-20736) Allow Potentially Long Running Restart Commands To Have Their Own Timeout
Date Tue, 11 Apr 2017 18:36:41 GMT
Nate Cole created AMBARI-20736:

             Summary: Allow Potentially Long Running Restart Commands To Have Their Own Timeout
                 Key: AMBARI-20736
             Project: Ambari
          Issue Type: Bug
          Components: ambari-server
            Reporter: Nate Cole
            Assignee: Nate Cole
            Priority: Critical
             Fix For: 2.5.1

During an upgrade of a cluster, some commands are expected to take a very long time depending
on what the size of the cluster is and how much data is stored. For example, a NameNode restart
with SafeMode exit may take in excess of 30 minutes. On some clusters, this could take less
than 1 minute.

Currently today, the only way to adjust these properties is to do so across the board for
all commands by editing {{}} and setting {{agent.task.timeout}}. This solution
doesn't work very well since the majority of restarts during an upgrade are not on a master

There needs to be a way to instruct Ambari that a restart should be allowed to run for a relatively
long period of time. 

- Both Java and Python need to be considered here. We don't want Python to give up and return
a {{FAILED}} state and we don't want Ambari server to set the task to {{TIMEDOUT}}.

- This can be useful in both normal restarts and upgrade scenarios. 

h3. Upgrade Only
If considering this functionality in the context of an upgrade only, then it is conceivable
that this logic can be placed inside of the upgrade XML packs:
    <service name="HDFS">
      <component name="NAMENODE">
          <task xsi:type="restart-task"  timeout="1800"/>

- This would allow future mpacks to be able to control the restart of components. Perhaps
this can even be slightly abstracted out:

    <service name="HDFS">
      <component name="NAMENODE">
          <task xsi:type="restart-task"  timeout="upgrade.parameter.master.restart.long"/>

upgrade.parameter.slave.restart.short = 300
upgrade.parameter.slave.restart.long = 900
upgrade.parameter.master.restart.short = 1500
upgrade.parameter.master.restart.long = 1800

This message was sent by Atlassian JIRA

View raw message