ambari-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alejandro Fernandez <afernan...@hortonworks.com>
Subject Re: Review Request 42970: Concurrent kinit Commands Cause Alerts To Randomly Trigger
Date Fri, 29 Jan 2016 22:22:27 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/42970/#review117033
-----------------------------------------------------------


Ship it!




Ship It!

- Alejandro Fernandez


On Jan. 29, 2016, 7:28 p.m., Jonathan Hurley wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/42970/
> -----------------------------------------------------------
> 
> (Updated Jan. 29, 2016, 7:28 p.m.)
> 
> 
> Review request for Ambari, Alejandro Fernandez, Eugene Chekanskiy, and Nate Cole.
> 
> 
> Bugs: AMBARI-14847
>     https://issues.apache.org/jira/browse/AMBARI-14847
> 
> 
> Repository: ambari
> 
> 
> Description
> -------
> 
> The alerts framework on each Ambari Agent runs alerts in a threadpool when the job triggers.
This can cause the following error to randomly appear and the alert to go CRITICAL:
> 
> {noformat}
>  Connection failed to http://nat-rare-21-dvitiiuk-2-5.novalocal:8088 (Execution of '/usr/bin/kinit
-l 5m -c /var/lib/ambari-agent/tmp/web_alert_cc_f3f99363c3b7d1667f1287ce3a35aa52 -kt /etc/security/keytabs/spnego.service.keytab
HTTP/nat-rare-21-dvitiiuk-2-5.novalocal@EXAMPLE.COM > /dev/null' returned 1.
> 
> kinit: Internal credentials cache error while storing credentials while getting initial
credentials)
> {noformat}
> 
> The alerts would randomly go CRITICAL at the end of their ticket expiration time only
to become OK again shortly after. 
> 
> The cause is that the {{kinit}} command being executed to create new credentials cannot
be run concurrently for the same user.
> 
> 
> Diffs
> -----
> 
>   ambari-common/src/main/python/resource_management/core/global_lock.py PRE-CREATION

>   ambari-common/src/main/python/resource_management/libraries/functions/curl_krb_request.py
b42a8a3 
>   ambari-common/src/main/python/resource_management/libraries/functions/hive_check.py
aacb176 
>   ambari-server/src/main/resources/common-services/HIVE/0.12.0.2.0/package/alerts/alert_hive_metastore.py
dbf0600 
>   ambari-server/src/main/resources/common-services/HIVE/0.12.0.2.0/package/alerts/alert_webhcat_server.py
1e95703 
>   ambari-server/src/main/resources/common-services/OOZIE/4.0.0.2.0/package/alerts/alert_check_oozie_server.py
fcc2d49 
>   ambari-server/src/test/python/TestGlobalLock.py PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/42970/diff/
> 
> 
> Testing
> -------
> 
> Deployed to a cluster experiencing the issue.
> 
> ----------------------------------------------------------------------
> Total run:868
> Total errors:0
> Total failures:0
> OK
> 
> 
> Thanks,
> 
> Jonathan Hurley
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message