ambari-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Hurley" <jhur...@hortonworks.com>
Subject Review Request 32107: Ambari Agent Alerts Prevents Binding to the Ping Port Listener On Startup
Date Mon, 16 Mar 2015 15:32:44 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/32107/
-----------------------------------------------------------

Review request for Ambari, Nate Cole and Tom Beerbower.


Bugs: AMBARI-10083
    https://issues.apache.org/jira/browse/AMBARI-10083


Repository: ambari


Description
-------

This is a hard one to reproduce. When the agent spawns a child process and then terminates
while that child process is running, the ping port server socket is sometimes held by the
child:

```
UID        PID  PPID  C STIME TTY          TIME CMD
root     23667 23663  0 09:40 ?        00:00:00 /usr/bin/sudo su ambari-qa
-l -s /bin/bash -c export
PATH='/usr/sbin:/sbin:/usr/lib/ambari-server/*:/usr/sbin:/sbin:/usr/lib/a
mbari-server/*:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin
:/root/bin:/var/lib/ambari-agent:/var/lib/ambari-agent:/bin/:/usr/bin/:/usr
/lib/hive/bin/:/usr/sbin/
' ; hive --hiveconf
hive.metastore.uris=thrift://hdp2-02-02:9083 -e 'show
databases;'

INFO 2015-03-11 09:40:33,433 PingPortListener.py:62 - Ping port listener
killed
```

We should always be using sys.exit() instead of os._exit() since _exit() prevents the cleanup
handlers from running. This also includes handlers built inside of the APScheduler.


Diffs
-----

  ambari-agent/src/main/python/ambari_agent/AlertSchedulerHandler.py a2ea8ef 
  ambari-agent/src/main/python/ambari_agent/Controller.py 2300b47 
  ambari-agent/src/main/python/ambari_agent/ProcessHelper.py 2d99dd1 
  ambari-agent/src/main/python/ambari_agent/main.py ebf0781 
  ambari-agent/src/test/python/ambari_agent/TestMain.py 0a3e878 
  ambari-server/src/main/resources/common-services/HIVE/0.12.0.2.0/package/alerts/alert_hive_metastore.py
120c4a0 
  ambari-server/src/main/resources/common-services/HIVE/0.12.0.2.0/package/alerts/alert_hive_thrift_port.py
c496717 

Diff: https://reviews.apache.org/r/32107/diff/


Testing
-------

Started/Stopped agents about a bigillion times with some services started, some stopped, and
some sick. 

[INFO] Rat check: Summary of files. Unapproved: 0 unknown: 0 generated: 0 approved: 124 licence.
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 10.286 s
[INFO] Finished at: 2015-03-16T11:02:25-04:00
[INFO] Final Memory: 8M/81M
[INFO] ------------------------------------------------------------------------


Thanks,

Jonathan Hurley


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message