ambari-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alejandro Fernandez (JIRA)" <>
Subject [jira] [Created] (AMBARI-9717) Kafka & Spark service checks fail intermittently on kerberized cluster
Date Fri, 20 Feb 2015 01:12:11 GMT
Alejandro Fernandez created AMBARI-9717:

             Summary: Kafka & Spark service checks fail intermittently on kerberized cluster
                 Key: AMBARI-9717
             Project: Ambari
          Issue Type: Bug
          Components: ambari-server
    Affects Versions: 2.0.0
            Reporter: Alejandro Fernandez
            Assignee: Alejandro Fernandez
             Fix For: 2.0.0

Impact: Prevents RU from completing successfully
Frequency: reproduces often

I ran into this while performing an RU during the following,
* Installed a 3-node cluster with ambari build #427
* Installed HDP on centos 6
* Added HDFS and ZK
* Added Namenode HA
* Added all services (including Spark and Ranger)
* Kerberized the cluster (failed to start due to AMS service check)
* Registered repo HDP
* Performed a RU

Running kafka create topic command
2015-02-18 03:29:51,851 - u'Execute[\'source /etc/kafka/conf/ ; /usr/hdp/current/kafka-broker//bin/
--create --topic ambari_kafka_service_check --partitions 1 --replication-factor 1 | grep \'Created
topic "ambari_kafka_service_check".\\|Topic "ambari_kafka_service_check" already exists.\'\']'
{'logoutput': True}
2015-02-18 03:29:54,183 - Error while executing command 'service_check':
Traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/",
line 208, in execute
  File "/var/lib/ambari-agent/cache/common-services/KAFKA/",
line 37, in service_check
  File "/usr/lib/python2.6/site-packages/resource_management/core/", line 148, in __init__
  File "/usr/lib/python2.6/site-packages/resource_management/core/", line 152,
in run
    self.run_action(resource, action)
  File "/usr/lib/python2.6/site-packages/resource_management/core/", line 118,
in run_action
  File "/usr/lib/python2.6/site-packages/resource_management/core/providers/", line
276, in action_run
    raise ex
Fail: Execution of 'source /etc/kafka/conf/ ; /usr/hdp/current/kafka-broker//bin/
--create --topic ambari_kafka_service_check --partitions 1 --replication-factor 1 | grep 'Created
topic "ambari_kafka_service_check".\|Topic "ambari_kafka_service_check" already exists.''
returned 1.

It turns out that the Kafka topic command can return a nonzero exit code, which is valid,
so the output just needs to be validated against a regex expression.

For Spark, it was not kinit'ing before running the service check. 

This message was sent by Atlassian JIRA

View raw message