ambari-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aravindan Vijayan (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (AMBARI-19204) Metrics monitor start failed after deleting AMS and reinstalling with different user
Date Wed, 14 Dec 2016 20:05:58 GMT

     [ https://issues.apache.org/jira/browse/AMBARI-19204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Aravindan Vijayan updated AMBARI-19204:
---------------------------------------
    Status: Patch Available  (was: Open)

> Metrics monitor start failed after deleting AMS and reinstalling with different user
> ------------------------------------------------------------------------------------
>
>                 Key: AMBARI-19204
>                 URL: https://issues.apache.org/jira/browse/AMBARI-19204
>             Project: Ambari
>          Issue Type: Bug
>          Components: ambari-metrics
>    Affects Versions: 2.5.0
>            Reporter: Aravindan Vijayan
>            Assignee: Aravindan Vijayan
>             Fix For: 2.5.0
>
>         Attachments: AMBARI-19204.patch
>
>
> STR: 
> 1) Delete Service AMS along with Tez,HBase, Sqoop, Oozie, Falcon, Storm, Ambari Infra,
Ambari Metrics, Kafka, Knox, Log Search, Smartsense, Mahout, Slider
> 2) Add all the deleted services back
> Metrics collector fails to start with 
> {noformat}
> Traceback (most recent call last):
>   File "/var/lib/ambari-agent/cache/common-services/AMBARI_METRICS/0.1.0/package/scripts/metrics_monitor.py",
line 68, in <module>
>     AmsMonitor().execute()
>   File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py",
line 282, in execute
>     method(env)
>   File "/var/lib/ambari-agent/cache/common-services/AMBARI_METRICS/0.1.0/package/scripts/metrics_monitor.py",
line 42, in start
>     action = 'start'
>   File "/usr/lib/python2.6/site-packages/ambari_commons/os_family_impl.py", line 89,
in thunk
>     return fn(*args, **kwargs)
>   File "/var/lib/ambari-agent/cache/common-services/AMBARI_METRICS/0.1.0/package/scripts/ams_service.py",
line 103, in ams_service
>     user=params.ams_user
>   File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 155,
in __init__
>     self.env.run()
>   File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line
160, in run
>     self.run_action(resource, action)
>   File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line
124, in run_action
>     provider_action()
>   File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py",
line 262, in action_run
>     tries=self.resource.tries, try_sleep=self.resource.try_sleep)
>   File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 72,
in inner
>     result = function(command, **kwargs)
>   File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 102,
in checked_call
>     tries=tries, try_sleep=try_sleep, timeout_kill_strategy=timeout_kill_strategy)
>   File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 150,
in _call_wrapper
>     result = _call(command, **kwargs_copy)
>   File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 303,
in _call
>     raise ExecutionFailed(err_msg, code, out, err)
> resource_management.core.exceptions.ExecutionFailed: Execution of '/usr/sbin/ambari-metrics-monitor
--config /etc/ambari-metrics-monitor/conf start' returned 255. ######## Hortonworks #############
> This is MOTD message, added for testing in qe infra
> psutil build directory is not empty, continuing...
> Verifying Python version compatibility...
> Using python  /usr/bin/python2.6
> Checking for previously running Metric Monitor...
> Starting ambari-metrics-monitor
> /usr/sbin/ambari-metrics-monitor: line 148: /grid/0/log/metric_monitor/ambari-metrics-monitor.out:
Permission denied
> Verifying ambari-metrics-monitor process status...
> ERROR: ambari-metrics-monitor start failed. For more details, see /grid/0/log/metric_monitor/ambari-metrics-monitor.out:
> ====================
> 2016-12-14 05:37:41,956 [ERROR] host_info.py:194 - Failed to read disk_usage for a mountpoint
: [Errno 13] Permission denied: '/ycloud-grid/0/hadoop/yarn/local/usercache/root/appcache/application_1481604818073_0640'
> 2016-12-14 05:37:41,956 [ERROR] host_info.py:194 - Failed to read disk_usage for a mountpoint
: [Errno 13] Permission denied: '/ycloud-grid/0/hadoop/yarn/local/usercache/root/appcache/application_1481604818073_0640/container_e83_1481604818073_0640_01_000007'
> 2016-12-14 05:37:51,956 [ERROR] host_info.py:194 - Failed to read disk_usage for a mountpoint
: [Errno 13] Permission denied: '/ycloud-grid/0/hadoop/yarn/local/usercache/root/appcache/application_1481604818073_0640'
> 2016-12-14 05:37:51,956 [ERROR] host_info.py:194 - Failed to read disk_usage for a mountpoint
: [Errno 13] Permission denied: '/ycloud-grid/0/hadoop/yarn/local/usercache/root/appcache/application_1481604818073_0640/container_e83_1481604818073_0640_01_000007'
> 2016-12-14 05:38:01,957 [ERROR] host_info.py:194 - Failed to read disk_usage for a mountpoint
: [Errno 13] Permission denied: '/ycloud-grid/0/hadoop/yarn/local/usercache/root/appcache/application_1481604818073_0640'
> 2016-12-14 05:38:01,957 [ERROR] host_info.py:194 - Failed to read disk_usage for a mountpoint
: [Errno 13] Permission denied: '/ycloud-grid/0/hadoop/yarn/local/usercache/root/appcache/application_1481604818073_0640/container_e83_1481604818073_0640_01_000007'
> 2016-12-14 05:38:11,958 [ERROR] host_info.py:194 - Failed to read disk_usage for a mountpoint
: [Errno 13] Permission denied: '/ycloud-grid/0/hadoop/yarn/local/usercache/root/appcache/application_1481604818073_0640'
> 2016-12-14 05:38:11,958 [ERROR] host_info.py:194 - Failed to read disk_usage for a mountpoint
: [Errno 13] Permission denied: '/ycloud-grid/0/hadoop/yarn/local/usercache/root/appcache/application_1481604818073_0640/container_e83_1481604818073_0640_01_000007'
> 2016-12-14 05:38:21,959 [ERROR] host_info.py:194 - Failed to read disk_usage for a mountpoint
: [Errno 13] Permission denied: '/ycloud-grid/0/hadoop/yarn/local/usercache/root/appcache/application_1481604818073_0640'
> 2016-12-14 05:38:21,959 [ERROR] host_info.py:194 - Failed to read disk_usage for a mountpoint
: [Errno 13] Permission denied: '/ycloud-grid/0/hadoop/yarn/local/usercache/root/appcache/application_1481604818073_0640/container_e83_1481604818073_0640_01_000007'
> ====================
> Monitor out at: /grid/0/log/metric_monitor/ambari-metrics-monitor.out
> stdout:   /var/lib/ambari-agent/data/output-1028.txt
> 2016-12-14 06:12:10,119 - Using hadoop conf dir: /usr/hdp/current/hadoop-client/conf
> 2016-12-14 06:12:10,432 - Using hadoop conf dir: /usr/hdp/current/hadoop-client/conf
> 2016-12-14 06:12:10,433 - Group['cstm-knox-group'] {}
> 2016-12-14 06:12:10,434 - Group['hadoop'] {}
> 2016-12-14 06:12:10,435 - Group['users'] {}
> 2016-12-14 06:12:10,435 - User['zookeeper'] {'gid': 'hadoop', 'fetch_nonlocal_groups':
True, 'groups': ['hadoop']}
> 2016-12-14 06:12:10,436 - User['infra-solr'] {'gid': 'hadoop', 'fetch_nonlocal_groups':
True, 'groups': ['hadoop']}
> 2016-12-14 06:12:10,437 - User['cstm-sqoop'] {'gid': 'hadoop', 'fetch_nonlocal_groups':
True, 'groups': ['hadoop']}
> 2016-12-14 06:12:10,438 - User['cstm-ams'] {'gid': 'hadoop', 'fetch_nonlocal_groups':
True, 'groups': ['hadoop']}
> 2016-12-14 06:12:10,439 - User['cstm-tez'] {'gid': 'hadoop', 'fetch_nonlocal_groups':
True, 'groups': ['users']}
> 2016-12-14 06:12:10,441 - User['cstm-storm'] {'gid': 'hadoop', 'fetch_nonlocal_groups':
True, 'groups': ['hadoop']}
> 2016-12-14 06:12:10,442 - User['cstm-knox'] {'gid': 'hadoop', 'fetch_nonlocal_groups':
True, 'groups': ['hadoop']}
> 2016-12-14 06:12:10,443 - User['cstm-flume'] {'gid': 'hadoop', 'fetch_nonlocal_groups':
True, 'groups': ['hadoop']}
> 2016-12-14 06:12:10,444 - User['cstm-mahout'] {'gid': 'hadoop', 'fetch_nonlocal_groups':
True, 'groups': ['hadoop']}
> 2016-12-14 06:12:10,444 - User['cstm-hbase'] {'gid': 'hadoop', 'fetch_nonlocal_groups':
True, 'groups': ['hadoop']}
> 2016-12-14 06:12:10,445 - User['logsearch'] {'gid': 'hadoop', 'fetch_nonlocal_groups':
True, 'groups': ['hadoop']}
> 2016-12-14 06:12:10,446 - User['cstm-falcon'] {'gid': 'hadoop', 'fetch_nonlocal_groups':
True, 'groups': ['users']}
> 2016-12-14 06:12:10,447 - User['ambari-qa'] {'gid': 'hadoop', 'fetch_nonlocal_groups':
True, 'groups': ['users']}
> 2016-12-14 06:12:10,448 - User['kafka'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True,
'groups': ['hadoop']}
> 2016-12-14 06:12:10,449 - User['hdfs'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True,
'groups': ['hadoop']}
> 2016-12-14 06:12:10,450 - User['cstm-oozie'] {'gid': 'hadoop', 'fetch_nonlocal_groups':
True, 'groups': ['users']}
> 2016-12-14 06:12:10,451 - User['yarn'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True,
'groups': ['hadoop']}
> 2016-12-14 06:12:10,452 - User['mapred'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True,
'groups': ['hadoop']}
> 2016-12-14 06:12:10,453 - File['/var/lib/ambari-agent/tmp/changeUid.sh'] {'content':
StaticFile('changeToSecureUid.sh'), 'mode': 0555}
> 2016-12-14 06:12:10,612 - Execute['/var/lib/ambari-agent/tmp/changeUid.sh ambari-qa /tmp/hadoop-ambari-qa,/tmp/hsperfdata_ambari-qa,/home/ambari-qa,/tmp/ambari-qa,/tmp/sqoop-ambari-qa']
{'not_if': '(test $(id -u ambari-qa) -gt 1000) || (false)'}
> 2016-12-14 06:12:10,626 - Skipping Execute['/var/lib/ambari-agent/tmp/changeUid.sh ambari-qa
/tmp/hadoop-ambari-qa,/tmp/hsperfdata_ambari-qa,/home/ambari-qa,/tmp/ambari-qa,/tmp/sqoop-ambari-qa']
due to not_if
> 2016-12-14 06:12:10,627 - Directory['/tmp/hbase-hbase'] {'owner': 'cstm-hbase', 'create_parents':
True, 'mode': 0775, 'cd_access': 'a'}
> 2016-12-14 06:12:10,826 - File['/var/lib/ambari-agent/tmp/changeUid.sh'] {'content':
StaticFile('changeToSecureUid.sh'), 'mode': 0555}
> 2016-12-14 06:12:10,963 - Execute['/var/lib/ambari-agent/tmp/changeUid.sh cstm-hbase
/home/cstm-hbase,/tmp/cstm-hbase,/usr/bin/cstm-hbase,/var/log/cstm-hbase,/tmp/hbase-hbase']
{'not_if': '(test $(id -u cstm-hbase) -gt 1000) || (false)'}
> 2016-12-14 06:12:10,983 - Skipping Execute['/var/lib/ambari-agent/tmp/changeUid.sh cstm-hbase
/home/cstm-hbase,/tmp/cstm-hbase,/usr/bin/cstm-hbase,/var/log/cstm-hbase,/tmp/hbase-hbase']
due to not_if
> 2016-12-14 06:12:10,984 - Group['hdfs'] {}
> 2016-12-14 06:12:10,984 - User['hdfs'] {'fetch_nonlocal_groups': True, 'groups': ['hadoop',
'hdfs']}
> 2016-12-14 06:12:10,985 - FS Type: 
> 2016-12-14 06:12:10,985 - Directory['/etc/hadoop'] {'mode': 0755}
> 2016-12-14 06:12:11,068 - File['/usr/hdp/current/hadoop-client/conf/hadoop-env.sh'] {'content':
InlineTemplate(...), 'owner': 'root', 'group': 'hadoop'}
> 2016-12-14 06:12:11,192 - Directory['/var/lib/ambari-agent/tmp/hadoop_java_io_tmpdir']
{'owner': 'hdfs', 'group': 'hadoop', 'mode': 01777}
> 2016-12-14 06:12:11,296 - Execute[('setenforce', '0')] {'not_if': '(! which getenforce
) || (which getenforce && getenforce | grep -q Disabled)', 'sudo': True, 'only_if':
'test -f /selinux/enforce'}
> 2016-12-14 06:12:11,317 - Skipping Execute[('setenforce', '0')] due to not_if
> 2016-12-14 06:12:11,317 - Directory['/grid/0/log/hdfs'] {'owner': 'root', 'create_parents':
True, 'group': 'hadoop', 'mode': 0775, 'cd_access': 'a'}
> 2016-12-14 06:12:11,603 - Directory['/grid/0/pid/hdfs'] {'owner': 'root', 'create_parents':
True, 'group': 'root', 'cd_access': 'a'}
> 2016-12-14 06:12:11,671 - Changing owner for /grid/0/pid/hdfs from 1021 to root
> 2016-12-14 06:12:11,671 - Changing group for /grid/0/pid/hdfs from 1006 to root
> 2016-12-14 06:12:11,861 - Directory['/tmp/hadoop-hdfs'] {'owner': 'hdfs', 'create_parents':
True, 'cd_access': 'a'}
> 2016-12-14 06:12:12,019 - File['/usr/hdp/current/hadoop-client/conf/commons-logging.properties']
{'content': Template('commons-logging.properties.j2'), 'owner': 'root'}
> 2016-12-14 06:12:12,143 - File['/usr/hdp/current/hadoop-client/conf/health_check'] {'content':
Template('health_check.j2'), 'owner': 'root'}
> 2016-12-14 06:12:12,248 - File['/usr/hdp/current/hadoop-client/conf/log4j.properties']
{'content': ..., 'owner': 'hdfs', 'group': 'hadoop', 'mode': 0644}
> 2016-12-14 06:12:12,380 - File['/usr/hdp/current/hadoop-client/conf/hadoop-metrics2.properties']
{'content': InlineTemplate(...), 'owner': 'hdfs', 'group': 'hadoop'}
> 2016-12-14 06:12:12,482 - File['/usr/hdp/current/hadoop-client/conf/task-log4j.properties']
{'content': StaticFile('task-log4j.properties'), 'mode': 0755}
> 2016-12-14 06:12:12,597 - File['/usr/hdp/current/hadoop-client/conf/configuration.xsl']
{'owner': 'hdfs', 'group': 'hadoop'}
> 2016-12-14 06:12:12,672 - File['/etc/hadoop/conf/topology_mappings.data'] {'owner': 'hdfs',
'content': Template('topology_mappings.data.j2'), 'only_if': 'test -d /etc/hadoop/conf', 'group':
'hadoop'}
> 2016-12-14 06:12:12,823 - File['/etc/hadoop/conf/topology_script.py'] {'content': StaticFile('topology_script.py'),
'only_if': 'test -d /etc/hadoop/conf', 'mode': 0755}
> 2016-12-14 06:12:13,461 - Using hadoop conf dir: /usr/hdp/current/hadoop-client/conf
> 2016-12-14 06:12:13,466 - checked_call['hostid'] {}
> 2016-12-14 06:12:13,485 - checked_call returned (0, '1bac0d12')
> 2016-12-14 06:12:13,488 - Directory['/etc/ambari-metrics-monitor/conf'] {'owner': 'cstm-ams',
'group': 'hadoop', 'create_parents': True}
> 2016-12-14 06:12:13,581 - Directory['/grid/0/log/metric_monitor'] {'owner': 'cstm-ams',
'group': 'hadoop', 'create_parents': True, 'mode': 0755}
> 2016-12-14 06:12:13,693 - Directory['/grid/0/pid/metric_monitor'] {'owner': 'cstm-ams',
'group': 'hadoop', 'create_parents': True, 'mode': 0755, 'cd_access': 'a'}
> 2016-12-14 06:12:13,971 - Directory['/usr/lib/python2.6/site-packages/resource_monitoring/psutil/build']
{'owner': 'cstm-ams', 'group': 'hadoop', 'create_parents': True, 'cd_access': 'a'}
> 2016-12-14 06:12:14,387 - Execute['ambari-sudo.sh chown -R cstm-ams:hadoop /usr/lib/python2.6/site-packages/resource_monitoring']
{}
> 2016-12-14 06:12:14,411 - TemplateConfig['/etc/ambari-metrics-monitor/conf/metric_monitor.ini']
{'owner': 'cstm-ams', 'template_tag': None, 'group': 'hadoop'}
> 2016-12-14 06:12:14,421 - File['/etc/ambari-metrics-monitor/conf/metric_monitor.ini']
{'content': Template('metric_monitor.ini.j2'), 'owner': 'cstm-ams', 'group': 'hadoop', 'mode':
None}
> 2016-12-14 06:12:14,549 - TemplateConfig['/etc/ambari-metrics-monitor/conf/metric_groups.conf']
{'owner': 'cstm-ams', 'template_tag': None, 'group': 'hadoop'}
> 2016-12-14 06:12:14,551 - File['/etc/ambari-metrics-monitor/conf/metric_groups.conf']
{'content': Template('metric_groups.conf.j2'), 'owner': 'cstm-ams', 'group': 'hadoop', 'mode':
None}
> 2016-12-14 06:12:14,672 - File['/etc/ambari-metrics-monitor/conf/ams-env.sh'] {'content':
InlineTemplate(...), 'owner': 'cstm-ams'}
> 2016-12-14 06:12:14,814 - Execute['/usr/sbin/ambari-metrics-monitor --config /etc/ambari-metrics-monitor/conf
start'] {'user': 'cstm-ams'}
> 2016-12-14 06:12:16,884 - Execute['find /grid/0/log/metric_monitor -maxdepth 1 -type
f -name '*' -exec echo '==> {} <==' \; -exec tail -n 40 {} \;'] {'logoutput': True,
'ignore_failures': True, 'user': 'cstm-ams'}
> ######## Hortonworks #############
> This is MOTD message, added for testing in qe infra
> ==> /grid/0/log/metric_monitor/ambari-metrics-monitor.out <==
> 2016-12-14 05:35:21,946 [ERROR] host_info.py:194 - Failed to read disk_usage for a mountpoint
: [Errno 13] Permission denied: '/ycloud-grid/0/hadoop/yarn/local/usercache/root/appcache/application_1481604818073_0640/container_e83_1481604818073_0640_01_000007'
> 2016-12-14 05:35:27,256 [INFO] emitter.py:152 - Calculated collector shard based on hostname
: ctr-e83-1481604818073-0640-01-000006.hwx.site
> {noformat}
> NOTE: During cluster initial installation, AMS was installed as user ams, but while re-adding
AMS, it was added as custom user (cstm-ams)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message