ambari-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alejandro Fernandez" <afernan...@hortonworks.com>
Subject Re: Review Request 27754: Hadoop install with yum timesout after 10 mins
Date Sat, 08 Nov 2014 22:58:47 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27754/
-----------------------------------------------------------

(Updated Nov. 8, 2014, 10:58 p.m.)


Review request for Ambari, Mahadev Konar, Sumit Mohanty, Sid Wagle, and Yusaku Sako.


Changes
-------

Will commit this first to trunk, and if all goes fine after 1 day, will commit to branch-1.7.0


Bugs: AMBARI-8220
    https://issues.apache.org/jira/browse/AMBARI-8220


Repository: ambari


Description
-------

Very often install fails due to timeout installing hadoop_2_2* packages, which can take up
to 8-12 mins.

Each service has a metainfo.xml file that defines the timeout for each Component for all types
of actions (e.g., INSTALL, START, CONFIGURE, STOP).

Ambari doesn't currently have a mechanism to set a different timeout just for the INSTALL
operation, so instead, the server side java code can do the following:

Get the default agent timeout from the ambari.properties file (which will be increased from
10 mins to 15 mins)

Get the service component's timeout if it exists. If the operation is an INSTALL and service
component timeout is less than the default timeout, then use the default timeout.


Diffs
-----

  ambari-agent/src/main/python/ambari_agent/PythonExecutor.py 874b70b 
  ambari-server/conf/unix/ambari.properties 8563cf2 
  ambari-server/src/main/java/org/apache/ambari/server/configuration/Configuration.java a0d5b39

  ambari-server/src/main/java/org/apache/ambari/server/controller/AmbariManagementControllerImpl.java
4f69dbb 

Diff: https://reviews.apache.org/r/27754/diff/


Testing (updated)
-------

----------------------------------------------------------------------
Total run:693
Total errors:0
Total failures:0
OK


Copied all of the changed files,

yes | cp /vagrant/ambari/ambari-agent/src/main/python/ambari_agent/PythonExecutor.py  /usr/lib/python2.6/site-packages/ambari_server/PythonExecutor.py
yes | cp /vagrant/ambari/ambari-agent/src/main/python/ambari_agent/PythonExecutor.py  /usr/lib/python2.6/site-packages/ambari_agent/PythonExecutor.py
yes | cp /vagrant/ambari/ambari-server/target/ambari-server-*.jar                     /usr/lib/ambari-server/ambari-server-*.jar

Edited /etc/ambari-server/conf/ambari.properties and changed the agent.task.timeout value
from 600 to 900.

Then modified the ResourceManager and NodeManager timeouts in /var/lib/ambari-agent/cache/stacks/HDP/2.0.6/services/HDFS/metainfo.xml
as follows,
ResourceManager         <timeout>642</timeout>
NodeManager             <timeout>1042</timeout>

Then ran ambari-server restart

Then created a cluster and added all of the services, and reran service checks.

Upon adding the YARN services and inspecting the command-*.json files, they had,


    "commandParams": {
        "command_timeout": "1042",
        "script": "scripts/nodemanager.py",
        "script_type": "PYTHON",
        "service_package_folder": "HDP/2.0.6/services/YARN/package",
        "hooks_folder": "HDP/2.0.6/hooks"
    },
    
        "commandParams": {
        "command_timeout": "900",
        "script": "scripts/resourcemanager.py",
        "script_type": "PYTHON",
        "service_package_folder": "HDP/2.0.6/services/YARN/package",
        "hooks_folder": "HDP/2.0.6/hooks"
    },


Notice that the resource manager initially had a value less than the agent default, so it
was increased to it.


Thanks,

Alejandro Fernandez


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message