ambari-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hadoop QA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (AMBARI-19204) Metrics monitor start failed after deleting AMS and reinstalling with different user
Date Wed, 14 Dec 2016 21:43:58 GMT

    [ https://issues.apache.org/jira/browse/AMBARI-19204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15749576#comment-15749576
] 

Hadoop QA commented on AMBARI-19204:
------------------------------------

{color:red}-1 overall{color}.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12843283/AMBARI-19204.patch
  against trunk revision .

    {color:green}+1 @author{color}.  The patch does not contain any @author tags.

    {color:red}-1 tests included{color}.  The patch doesn't appear to include any new or modified
tests.
                        Please justify why no new tests are needed for this patch.
                        Also please list what manual steps were performed to verify this patch.

    {color:green}+1 javac{color}.  The applied patch does not increase the total number of
javac compiler warnings.

    {color:green}+1 release audit{color}.  The applied patch does not increase the total number
of release audit warnings.

    {color:red}-1 core tests{color}.  The test build failed in ambari-server 

Test results: https://builds.apache.org/job/Ambari-trunk-test-patch/9671//testReport/
Console output: https://builds.apache.org/job/Ambari-trunk-test-patch/9671//console

This message is automatically generated.

> Metrics monitor start failed after deleting AMS and reinstalling with different user
> ------------------------------------------------------------------------------------
>
>                 Key: AMBARI-19204
>                 URL: https://issues.apache.org/jira/browse/AMBARI-19204
>             Project: Ambari
>          Issue Type: Bug
>          Components: ambari-metrics
>    Affects Versions: 2.5.0
>            Reporter: Aravindan Vijayan
>            Assignee: Aravindan Vijayan
>             Fix For: 2.5.0
>
>         Attachments: AMBARI-19204.patch
>
>
> STR: 
> 1) Delete Service AMS along with Tez,HBase, Sqoop, Oozie, Falcon, Storm, Ambari Infra,
Ambari Metrics, Kafka, Knox, Log Search, Smartsense, Mahout, Slider
> 2) Add all the deleted services back
> Metrics collector fails to start with 
> {noformat}
> Traceback (most recent call last):
>   File "/var/lib/ambari-agent/cache/common-services/AMBARI_METRICS/0.1.0/package/scripts/metrics_monitor.py",
line 68, in <module>
>     AmsMonitor().execute()
>   File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py",
line 282, in execute
>     method(env)
>   File "/var/lib/ambari-agent/cache/common-services/AMBARI_METRICS/0.1.0/package/scripts/metrics_monitor.py",
line 42, in start
>     action = 'start'
>   File "/usr/lib/python2.6/site-packages/ambari_commons/os_family_impl.py", line 89,
in thunk
>     return fn(*args, **kwargs)
>   File "/var/lib/ambari-agent/cache/common-services/AMBARI_METRICS/0.1.0/package/scripts/ams_service.py",
line 103, in ams_service
>     user=params.ams_user
>   File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 155,
in __init__
>     self.env.run()
>   File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line
160, in run
>     self.run_action(resource, action)
>   File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line
124, in run_action
>     provider_action()
>   File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py",
line 262, in action_run
>     tries=self.resource.tries, try_sleep=self.resource.try_sleep)
>   File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 72,
in inner
>     result = function(command, **kwargs)
>   File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 102,
in checked_call
>     tries=tries, try_sleep=try_sleep, timeout_kill_strategy=timeout_kill_strategy)
>   File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 150,
in _call_wrapper
>     result = _call(command, **kwargs_copy)
>   File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 303,
in _call
>     raise ExecutionFailed(err_msg, code, out, err)
> resource_management.core.exceptions.ExecutionFailed: Execution of '/usr/sbin/ambari-metrics-monitor
--config /etc/ambari-metrics-monitor/conf start' returned 255. ######## Hortonworks #############
> This is MOTD message, added for testing in qe infra
> psutil build directory is not empty, continuing...
> Verifying Python version compatibility...
> Using python  /usr/bin/python2.6
> Checking for previously running Metric Monitor...
> Starting ambari-metrics-monitor
> /usr/sbin/ambari-metrics-monitor: line 148: /grid/0/log/metric_monitor/ambari-metrics-monitor.out:
Permission denied
> Verifying ambari-metrics-monitor process status...
> ERROR: ambari-metrics-monitor start failed. For more details, see /grid/0/log/metric_monitor/ambari-metrics-monitor.out:
> ====================
> 2016-12-14 05:37:41,956 [ERROR] host_info.py:194 - Failed to read disk_usage for a mountpoint
: [Errno 13] Permission denied: '/ycloud-grid/0/hadoop/yarn/local/usercache/root/appcache/application_1481604818073_0640'
> 2016-12-14 05:37:41,956 [ERROR] host_info.py:194 - Failed to read disk_usage for a mountpoint
: [Errno 13] Permission denied: '/ycloud-grid/0/hadoop/yarn/local/usercache/root/appcache/application_1481604818073_0640/container_e83_1481604818073_0640_01_000007'
> 2016-12-14 05:37:51,956 [ERROR] host_info.py:194 - Failed to read disk_usage for a mountpoint
: [Errno 13] Permission denied: '/ycloud-grid/0/hadoop/yarn/local/usercache/root/appcache/application_1481604818073_0640'
> 2016-12-14 05:37:51,956 [ERROR] host_info.py:194 - Failed to read disk_usage for a mountpoint
: [Errno 13] Permission denied: '/ycloud-grid/0/hadoop/yarn/local/usercache/root/appcache/application_1481604818073_0640/container_e83_1481604818073_0640_01_000007'
> 2016-12-14 05:38:01,957 [ERROR] host_info.py:194 - Failed to read disk_usage for a mountpoint
: [Errno 13] Permission denied: '/ycloud-grid/0/hadoop/yarn/local/usercache/root/appcache/application_1481604818073_0640'
> 2016-12-14 05:38:01,957 [ERROR] host_info.py:194 - Failed to read disk_usage for a mountpoint
: [Errno 13] Permission denied: '/ycloud-grid/0/hadoop/yarn/local/usercache/root/appcache/application_1481604818073_0640/container_e83_1481604818073_0640_01_000007'
> 2016-12-14 05:38:11,958 [ERROR] host_info.py:194 - Failed to read disk_usage for a mountpoint
: [Errno 13] Permission denied: '/ycloud-grid/0/hadoop/yarn/local/usercache/root/appcache/application_1481604818073_0640'
> 2016-12-14 05:38:11,958 [ERROR] host_info.py:194 - Failed to read disk_usage for a mountpoint
: [Errno 13] Permission denied: '/ycloud-grid/0/hadoop/yarn/local/usercache/root/appcache/application_1481604818073_0640/container_e83_1481604818073_0640_01_000007'
> 2016-12-14 05:38:21,959 [ERROR] host_info.py:194 - Failed to read disk_usage for a mountpoint
: [Errno 13] Permission denied: '/ycloud-grid/0/hadoop/yarn/local/usercache/root/appcache/application_1481604818073_0640'
> 2016-12-14 05:38:21,959 [ERROR] host_info.py:194 - Failed to read disk_usage for a mountpoint
: [Errno 13] Permission denied: '/ycloud-grid/0/hadoop/yarn/local/usercache/root/appcache/application_1481604818073_0640/container_e83_1481604818073_0640_01_000007'
> ====================
> Monitor out at: /grid/0/log/metric_monitor/ambari-metrics-monitor.out
> stdout:   /var/lib/ambari-agent/data/output-1028.txt
> 2016-12-14 06:12:10,119 - Using hadoop conf dir: /usr/hdp/current/hadoop-client/conf
> 2016-12-14 06:12:10,432 - Using hadoop conf dir: /usr/hdp/current/hadoop-client/conf
> 2016-12-14 06:12:10,433 - Group['cstm-knox-group'] {}
> 2016-12-14 06:12:10,434 - Group['hadoop'] {}
> 2016-12-14 06:12:10,435 - Group['users'] {}
> 2016-12-14 06:12:10,435 - User['zookeeper'] {'gid': 'hadoop', 'fetch_nonlocal_groups':
True, 'groups': ['hadoop']}
> 2016-12-14 06:12:10,436 - User['infra-solr'] {'gid': 'hadoop', 'fetch_nonlocal_groups':
True, 'groups': ['hadoop']}
> 2016-12-14 06:12:10,437 - User['cstm-sqoop'] {'gid': 'hadoop', 'fetch_nonlocal_groups':
True, 'groups': ['hadoop']}
> 2016-12-14 06:12:10,438 - User['cstm-ams'] {'gid': 'hadoop', 'fetch_nonlocal_groups':
True, 'groups': ['hadoop']}
> 2016-12-14 06:12:10,439 - User['cstm-tez'] {'gid': 'hadoop', 'fetch_nonlocal_groups':
True, 'groups': ['users']}
> 2016-12-14 06:12:10,441 - User['cstm-storm'] {'gid': 'hadoop', 'fetch_nonlocal_groups':
True, 'groups': ['hadoop']}
> 2016-12-14 06:12:10,442 - User['cstm-knox'] {'gid': 'hadoop', 'fetch_nonlocal_groups':
True, 'groups': ['hadoop']}
> 2016-12-14 06:12:10,443 - User['cstm-flume'] {'gid': 'hadoop', 'fetch_nonlocal_groups':
True, 'groups': ['hadoop']}
> 2016-12-14 06:12:10,444 - User['cstm-mahout'] {'gid': 'hadoop', 'fetch_nonlocal_groups':
True, 'groups': ['hadoop']}
> 2016-12-14 06:12:10,444 - User['cstm-hbase'] {'gid': 'hadoop', 'fetch_nonlocal_groups':
True, 'groups': ['hadoop']}
> 2016-12-14 06:12:10,445 - User['logsearch'] {'gid': 'hadoop', 'fetch_nonlocal_groups':
True, 'groups': ['hadoop']}
> 2016-12-14 06:12:10,446 - User['cstm-falcon'] {'gid': 'hadoop', 'fetch_nonlocal_groups':
True, 'groups': ['users']}
> 2016-12-14 06:12:10,447 - User['ambari-qa'] {'gid': 'hadoop', 'fetch_nonlocal_groups':
True, 'groups': ['users']}
> 2016-12-14 06:12:10,448 - User['kafka'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True,
'groups': ['hadoop']}
> 2016-12-14 06:12:10,449 - User['hdfs'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True,
'groups': ['hadoop']}
> 2016-12-14 06:12:10,450 - User['cstm-oozie'] {'gid': 'hadoop', 'fetch_nonlocal_groups':
True, 'groups': ['users']}
> 2016-12-14 06:12:10,451 - User['yarn'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True,
'groups': ['hadoop']}
> 2016-12-14 06:12:10,452 - User['mapred'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True,
'groups': ['hadoop']}
> 2016-12-14 06:12:10,453 - File['/var/lib/ambari-agent/tmp/changeUid.sh'] {'content':
StaticFile('changeToSecureUid.sh'), 'mode': 0555}
> 2016-12-14 06:12:10,612 - Execute['/var/lib/ambari-agent/tmp/changeUid.sh ambari-qa /tmp/hadoop-ambari-qa,/tmp/hsperfdata_ambari-qa,/home/ambari-qa,/tmp/ambari-qa,/tmp/sqoop-ambari-qa']
{'not_if': '(test $(id -u ambari-qa) -gt 1000) || (false)'}
> 2016-12-14 06:12:10,626 - Skipping Execute['/var/lib/ambari-agent/tmp/changeUid.sh ambari-qa
/tmp/hadoop-ambari-qa,/tmp/hsperfdata_ambari-qa,/home/ambari-qa,/tmp/ambari-qa,/tmp/sqoop-ambari-qa']
due to not_if
> 2016-12-14 06:12:10,627 - Directory['/tmp/hbase-hbase'] {'owner': 'cstm-hbase', 'create_parents':
True, 'mode': 0775, 'cd_access': 'a'}
> 2016-12-14 06:12:10,826 - File['/var/lib/ambari-agent/tmp/changeUid.sh'] {'content':
StaticFile('changeToSecureUid.sh'), 'mode': 0555}
> 2016-12-14 06:12:10,963 - Execute['/var/lib/ambari-agent/tmp/changeUid.sh cstm-hbase
/home/cstm-hbase,/tmp/cstm-hbase,/usr/bin/cstm-hbase,/var/log/cstm-hbase,/tmp/hbase-hbase']
{'not_if': '(test $(id -u cstm-hbase) -gt 1000) || (false)'}
> 2016-12-14 06:12:10,983 - Skipping Execute['/var/lib/ambari-agent/tmp/changeUid.sh cstm-hbase
/home/cstm-hbase,/tmp/cstm-hbase,/usr/bin/cstm-hbase,/var/log/cstm-hbase,/tmp/hbase-hbase']
due to not_if
> 2016-12-14 06:12:10,984 - Group['hdfs'] {}
> 2016-12-14 06:12:10,984 - User['hdfs'] {'fetch_nonlocal_groups': True, 'groups': ['hadoop',
'hdfs']}
> 2016-12-14 06:12:10,985 - FS Type: 
> 2016-12-14 06:12:10,985 - Directory['/etc/hadoop'] {'mode': 0755}
> 2016-12-14 06:12:11,068 - File['/usr/hdp/current/hadoop-client/conf/hadoop-env.sh'] {'content':
InlineTemplate(...), 'owner': 'root', 'group': 'hadoop'}
> 2016-12-14 06:12:11,192 - Directory['/var/lib/ambari-agent/tmp/hadoop_java_io_tmpdir']
{'owner': 'hdfs', 'group': 'hadoop', 'mode': 01777}
> 2016-12-14 06:12:11,296 - Execute[('setenforce', '0')] {'not_if': '(! which getenforce
) || (which getenforce && getenforce | grep -q Disabled)', 'sudo': True, 'only_if':
'test -f /selinux/enforce'}
> 2016-12-14 06:12:11,317 - Skipping Execute[('setenforce', '0')] due to not_if
> 2016-12-14 06:12:11,317 - Directory['/grid/0/log/hdfs'] {'owner': 'root', 'create_parents':
True, 'group': 'hadoop', 'mode': 0775, 'cd_access': 'a'}
> 2016-12-14 06:12:11,603 - Directory['/grid/0/pid/hdfs'] {'owner': 'root', 'create_parents':
True, 'group': 'root', 'cd_access': 'a'}
> 2016-12-14 06:12:11,671 - Changing owner for /grid/0/pid/hdfs from 1021 to root
> 2016-12-14 06:12:11,671 - Changing group for /grid/0/pid/hdfs from 1006 to root
> 2016-12-14 06:12:11,861 - Directory['/tmp/hadoop-hdfs'] {'owner': 'hdfs', 'create_parents':
True, 'cd_access': 'a'}
> 2016-12-14 06:12:12,019 - File['/usr/hdp/current/hadoop-client/conf/commons-logging.properties']
{'content': Template('commons-logging.properties.j2'), 'owner': 'root'}
> 2016-12-14 06:12:12,143 - File['/usr/hdp/current/hadoop-client/conf/health_check'] {'content':
Template('health_check.j2'), 'owner': 'root'}
> 2016-12-14 06:12:12,248 - File['/usr/hdp/current/hadoop-client/conf/log4j.properties']
{'content': ..., 'owner': 'hdfs', 'group': 'hadoop', 'mode': 0644}
> 2016-12-14 06:12:12,380 - File['/usr/hdp/current/hadoop-client/conf/hadoop-metrics2.properties']
{'content': InlineTemplate(...), 'owner': 'hdfs', 'group': 'hadoop'}
> 2016-12-14 06:12:12,482 - File['/usr/hdp/current/hadoop-client/conf/task-log4j.properties']
{'content': StaticFile('task-log4j.properties'), 'mode': 0755}
> 2016-12-14 06:12:12,597 - File['/usr/hdp/current/hadoop-client/conf/configuration.xsl']
{'owner': 'hdfs', 'group': 'hadoop'}
> 2016-12-14 06:12:12,672 - File['/etc/hadoop/conf/topology_mappings.data'] {'owner': 'hdfs',
'content': Template('topology_mappings.data.j2'), 'only_if': 'test -d /etc/hadoop/conf', 'group':
'hadoop'}
> 2016-12-14 06:12:12,823 - File['/etc/hadoop/conf/topology_script.py'] {'content': StaticFile('topology_script.py'),
'only_if': 'test -d /etc/hadoop/conf', 'mode': 0755}
> 2016-12-14 06:12:13,461 - Using hadoop conf dir: /usr/hdp/current/hadoop-client/conf
> 2016-12-14 06:12:13,466 - checked_call['hostid'] {}
> 2016-12-14 06:12:13,485 - checked_call returned (0, '1bac0d12')
> 2016-12-14 06:12:13,488 - Directory['/etc/ambari-metrics-monitor/conf'] {'owner': 'cstm-ams',
'group': 'hadoop', 'create_parents': True}
> 2016-12-14 06:12:13,581 - Directory['/grid/0/log/metric_monitor'] {'owner': 'cstm-ams',
'group': 'hadoop', 'create_parents': True, 'mode': 0755}
> 2016-12-14 06:12:13,693 - Directory['/grid/0/pid/metric_monitor'] {'owner': 'cstm-ams',
'group': 'hadoop', 'create_parents': True, 'mode': 0755, 'cd_access': 'a'}
> 2016-12-14 06:12:13,971 - Directory['/usr/lib/python2.6/site-packages/resource_monitoring/psutil/build']
{'owner': 'cstm-ams', 'group': 'hadoop', 'create_parents': True, 'cd_access': 'a'}
> 2016-12-14 06:12:14,387 - Execute['ambari-sudo.sh chown -R cstm-ams:hadoop /usr/lib/python2.6/site-packages/resource_monitoring']
{}
> 2016-12-14 06:12:14,411 - TemplateConfig['/etc/ambari-metrics-monitor/conf/metric_monitor.ini']
{'owner': 'cstm-ams', 'template_tag': None, 'group': 'hadoop'}
> 2016-12-14 06:12:14,421 - File['/etc/ambari-metrics-monitor/conf/metric_monitor.ini']
{'content': Template('metric_monitor.ini.j2'), 'owner': 'cstm-ams', 'group': 'hadoop', 'mode':
None}
> 2016-12-14 06:12:14,549 - TemplateConfig['/etc/ambari-metrics-monitor/conf/metric_groups.conf']
{'owner': 'cstm-ams', 'template_tag': None, 'group': 'hadoop'}
> 2016-12-14 06:12:14,551 - File['/etc/ambari-metrics-monitor/conf/metric_groups.conf']
{'content': Template('metric_groups.conf.j2'), 'owner': 'cstm-ams', 'group': 'hadoop', 'mode':
None}
> 2016-12-14 06:12:14,672 - File['/etc/ambari-metrics-monitor/conf/ams-env.sh'] {'content':
InlineTemplate(...), 'owner': 'cstm-ams'}
> 2016-12-14 06:12:14,814 - Execute['/usr/sbin/ambari-metrics-monitor --config /etc/ambari-metrics-monitor/conf
start'] {'user': 'cstm-ams'}
> 2016-12-14 06:12:16,884 - Execute['find /grid/0/log/metric_monitor -maxdepth 1 -type
f -name '*' -exec echo '==> {} <==' \; -exec tail -n 40 {} \;'] {'logoutput': True,
'ignore_failures': True, 'user': 'cstm-ams'}
> ######## Hortonworks #############
> This is MOTD message, added for testing in qe infra
> ==> /grid/0/log/metric_monitor/ambari-metrics-monitor.out <==
> 2016-12-14 05:35:21,946 [ERROR] host_info.py:194 - Failed to read disk_usage for a mountpoint
: [Errno 13] Permission denied: '/ycloud-grid/0/hadoop/yarn/local/usercache/root/appcache/application_1481604818073_0640/container_e83_1481604818073_0640_01_000007'
> 2016-12-14 05:35:27,256 [INFO] emitter.py:152 - Calculated collector shard based on hostname
: ctr-e83-1481604818073-0640-01-000006.hwx.site
> {noformat}
> NOTE: During cluster initial installation, AMS was installed as user ams, but while re-adding
AMS, it was added as custom user (cstm-ams)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message