ambari-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Hurley" <jhur...@hortonworks.com>
Subject Re: Review Request 39339: Expose Alert Grace Period Setting in Agents
Date Thu, 15 Oct 2015 16:49:00 GMT


> On Oct. 15, 2015, 11:59 a.m., Jonathan Hurley wrote:
> > ambari-agent/src/main/python/ambari_agent/AlertSchedulerHandler.py, lines 69-72
> > <https://reviews.apache.org/r/39339/diff/1/?file=1098924#file1098924line69>
> >
> >     Did you verify that the APS code will work with properties that are pre-pended
with `apscheduler`? We didn't have these before.
> 
> Andrew Onischuk wrote:
>     Looks like previously our APS_CONFIG was ignored (with default values instead). See
src/main/python/ambari_agent/apscheduler/scheduler.py:59
>     I found this investigating why misfire_grace_time is always 1, even with changed
value in the APS_CONFIG.
>     I verified that alerts triggers if services are down. Also I do not see any errors
in ambari-agent.log

Ah, very nice catch.


- Jonathan


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/39339/#review102787
-----------------------------------------------------------


On Oct. 15, 2015, 10:40 a.m., Andrew Onischuk wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/39339/
> -----------------------------------------------------------
> 
> (Updated Oct. 15, 2015, 10:40 a.m.)
> 
> 
> Review request for Ambari, Jonathan Hurley and Nate Cole.
> 
> 
> Bugs: AMBARI-13434
>     https://issues.apache.org/jira/browse/AMBARI-13434
> 
> 
> Repository: ambari
> 
> 
> Description
> -------
> 
> On some deployments, hosts may be required to run many alerts depending on the
> number of components installed. If the number of components is large, it's
> possible that alert jobs may miss their scheduled intervals. The default grace
> period set by APS is 1 second, which is rather aggressive.
> 
>     
>     
>     
>     WARNING 2015-07-29 20:59:50,733 scheduler.py:496 - Run time of job "947770c6-424a-4ef8-9a46-19eca8fd080b
(trigger: interval[0:01:00], next run at: 2015-07-29 20:59:49.309353)" was missed by 0:00:01.423766
>     WARNING 2015-07-29 20:59:50,734 scheduler.py:496 - Run time of job "005b1d50-2aca-4af2-a3b4-bc39e6f65ede
(trigger: interval[0:01:00], next run at: 2015-07-29 21:00:49.309646)" was missed by 0:00:01.424313
>     WARNING 2015-07-29 20:59:50,734 scheduler.py:496 - Run time of job "6950ff19-c26c-46b7-8bac-1869773f1380
(trigger: interval[0:01:00], next run at: 2015-07-29 20:59:49.309840)" was missed by 0:00:01.424364
>     WARNING 2015-07-29 20:59:50,735 scheduler.py:496 - Run time of job "d986b9eb-bfd4-400f-b107-5640495eeece
(trigger: interval[0:01:00], next run at: 2015-07-29 20:59:49.310025)" was missed by 0:00:01.425144
>     WARNING 2015-07-29 20:59:50,735 scheduler.py:496 - Run time of job "3589154e-a8e3-441d-b3cb-a93fd49e1dfe
(trigger: interval[0:01:00], next run at: 2015-07-29 20:59:49.310204)" was missed by 0:00:01.425600
>     WARNING 2015-07-29 20:59:50,736 scheduler.py:496 - Run time of job "04a7f393-800b-4728-95be-28c2ca091ade
(trigger: interval[0:01:00], next run at: 2015-07-29 20:59:49.310380)" was missed by 0:00:01.425769
>     WARNING 2015-07-29 20:59:50,737 scheduler.py:496 - Run time of job "f0e2a065-af36-476c-b6b9-b662471c3f22
(trigger: interval[0:01:00], next run at: 2015-07-29 20:59:49.310759)" was missed by 0:00:01.426607
>     WARNING 2015-07-29 20:59:50,738 scheduler.py:496 - Run time of job "76accffd-e390-4aaa-8b35-8219ef4b3057
(trigger: interval[0:01:00], next run at: 2015-07-29 21:00:49.311118)" was missed by 0:00:01.427039
>     WARNING 2015-07-29 20:59:50,738 scheduler.py:496 - Run time of job "e0ce4088-2f0c-4f6d-8642-26ba94b3c66a
(trigger: interval[0:01:00], next run at: 2015-07-29 20:59:49.311297)" was missed by 0:00:01.426953
>     WARNING 2015-07-29 20:59:50,739 scheduler.py:496 - Run time of job "9cb39eb2-8ce4-408e-8030-a36362d5b5af
(trigger: interval[0:01:00], next run at: 2015-07-29 20:59:49.311501)" was missed by 0:00:01.427677
>     WARNING 2015-07-29 20:59:50,740 scheduler.py:496 - Run time of job "c299b3ab-ced6-4423-8f39-e16427157d98
(trigger: interval[0:01:00], next run at: 2015-07-29 21:00:49.312033)" was missed by 0:00:01.427972
>     WARNING 2015-07-29 20:59:50,740 scheduler.py:496 - Run time of job "cd444594-7859-482d-ae04-348ee7653da2
(trigger: interval[0:01:00], next run at: 2015-07-29 20:59:49.312208)" was missed by 0:00:01.428285
>     WARNING 2015-07-29 20:59:50,741 scheduler.py:496 - Run time of job "9afd8b3e-8850-4f2d-9ce7-a130be6b933b
(trigger: interval[0:01:00], next run at: 2015-07-29 20:59:49.312385)" was missed by 0:00:01.428689
>     WARNING 2015-07-29 20:59:50,741 scheduler.py:496 - Run time of job "be140827-a21f-4782-a109-bde8bcbc35c2
(trigger: interval[0:01:00], next run at: 2015-07-29 20:59:49.312574)" was missed by 0:00:01.429298
>     WARNING 2015-07-29 20:59:50,742 scheduler.py:496 - Run time of job "e009b685-717f-4552-8dfb-35a4d9d3d658
(trigger: interval[0:01:00], next run at: 2015-07-29 20:59:49.312751)" was missed by 0:00:01.429906
>     WARNING 2015-07-29 20:59:50,743 scheduler.py:496 - Run time of job "f42e635f-ce2d-47b6-8da3-10c7bfef7c3c
(trigger: interval[0:01:00], next run at: 2015-07-29 20:59:49.312927)" was missed by 0:00:01.430541
>     WARNING 2015-07-29 20:59:50,744 scheduler.py:496 - Run time of job "ace91b40-28e2-472a-ac97-8b01dc3bd976
(trigger: interval[0:01:00], next run at: 2015-07-29 20:59:49.313280)" was missed by 0:00:01.430793
>     WARNING 2015-07-29 20:59:50,744 scheduler.py:496 - Run time of job "77ea324a-a836-4f32-a751-1a596417bc11
(trigger: interval[0:01:00], next run at: 2015-07-29 21:00:49.313461)" was missed by 0:00:01.431357
>     WARNING 2015-07-29 20:59:50,745 scheduler.py:496 - Run time of job "e74f63b0-4143-4ebb-9adc-8e124eae1f99
(trigger: interval[0:02:00], next run at: 2015-07-29 20:59:49.313642)" was missed by 0:00:01.431588
>     WARNING 2015-07-29 20:59:50,746 scheduler.py:496 - Run time of job "3640c1eb-e7a2-4783-9480-e7f2129a4093
(trigger: interval[0:02:00], next run at: 2015-07-29 21:01:49.313817)" was missed by 0:00:01.432356
>     WARNING 2015-07-29 20:59:50,746 scheduler.py:496 - Run time of job "5b1fb2e8-8488-429b-9310-ca882b775c25
(trigger: interval[0:01:00], next run at: 2015-07-29 21:00:49.314182)" was missed by 0:00:01.432292
>     WARNING 2015-07-29 20:59:50,746 scheduler.py:496 - Run time of job "509bb649-e065-492a-a258-9a8e48e5d79c
(trigger: interval[0:01:00], next run at: 2015-07-29 21:00:49.314359)" was missed by 0:00:01.432485
>     WARNING 2015-07-29 20:59:50,747 scheduler.py:496 - Run time of job "211e7885-368e-415d-8875-a5abb66071c3
(trigger: interval[0:01:00], next run at: 2015-07-29 21:00:49.314546)" was missed by 0:00:01.432553
>     WARNING 2015-07-29 20:59:50,747 scheduler.py:496 - Run time of job "239e8d13-1f31-4b2d-ac6f-b66294700814
(trigger: interval[0:01:00], next run at: 2015-07-29 21:00:49.314722)" was missed by 0:00:01.432682
>     WARNING 2015-07-29 20:59:50,747 scheduler.py:496 - Run time of job "bc300bfc-7f4f-4015-84a6-4bfe761f4167
(trigger: interval[0:02:00], next run at: 2015-07-29 21:01:49.314897)" was missed by 0:00:01.432882
>     WARNING 2015-07-29 20:59:50,748 scheduler.py:496 - Run time of job "0e800a78-48fa-4738-8bab-dc0b57ecc6fa
(trigger: interval[0:01:00], next run at: 2015-07-29 21:00:49.315072)" was missed by 0:00:01.433000
>     WARNING 2015-07-29 20:59:50,748 scheduler.py:496 - Run time of job "19190cfd-d9b4-4869-81ec-0bdce227540e
(trigger: interval[0:01:00], next run at: 2015-07-29 21:00:49.315246)" was missed by 0:00:01.433040
>     WARNING 2015-07-29 20:59:50,748 scheduler.py:496 - Run time of job "7f102c1d-3e4e-4b46-b89d-f6df4c231591
(trigger: interval[0:01:00], next run at: 2015-07-29 21:00:49.315782)" was missed by 0:00:01.432642
>     WARNING 2015-07-29 20:59:50,749 scheduler.py:496 - Run time of job "8ef15a08-698b-429f-8925-4d6e5c49c01d
(trigger: interval[0:01:00], next run at: 2015-07-29 21:00:49.315959)" was missed by 0:00:01.433006
>     
> 
> The setting can be exposed in
> [AlertSchedulerHandler.py](https://github.com/apache/ambari/blob/trunk/ambari-
> agent/src/main/python/ambari_agent/AlertSchedulerHandler.py#L46) by adding
> `misfire_grace_time`:
> 
>     
>     
>     
>       APS_CONFIG = { 
>         'threadpool.core_threads': 3,
>         'coalesce': True,
>         'standalone': False,
>         'misfire_grace_time': 5
>       }
>     
> 
>   * Expose the ability to set this grace period via the agent's configuration file
>   * Increase the default amount from 1 second to 5 seconds
> 
> 
> Diffs
> -----
> 
>   ambari-agent/conf/unix/ambari-agent.ini 3b7631c 
>   ambari-agent/conf/windows/ambari-agent.ini 972e11e 
>   ambari-agent/src/main/python/ambari_agent/AlertSchedulerHandler.py cddee57 
>   ambari-agent/src/main/python/ambari_agent/Controller.py 74a8eac 
>   ambari-agent/src/test/python/ambari_agent/TestAlertSchedulerHandler.py d15cd32 
>   ambari-agent/src/test/python/ambari_agent/TestAlerts.py dab717d 
> 
> Diff: https://reviews.apache.org/r/39339/diff/
> 
> 
> Testing
> -------
> 
> mvn clean test
> 
> 
> Thanks,
> 
> Andrew Onischuk
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message