ambari-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Siddharth Wagle <swa...@hortonworks.com>
Subject Re: metrics visible from host_component but not component
Date Wed, 11 Nov 2015 19:18:30 GMT
Hi Nate,


I'm glad you figured out the fine grained details. I will try to add a wiki page on this to
help others out.


The timeline.metrics.service.cluster.aggregator.appIds controls the system level app metrics,
meaning in addition to what metrics you are writing to AMS, it will give you the system metrics
example cpu / memory / disk etc, for gpfs. (You can query cpu_user or mem_free for gpfs app)


The case-sensitivity is important, the above change is optional.


- Sid


________________________________
From: Nathan Falk <nfalk@us.ibm.com>
Sent: Wednesday, November 11, 2015 7:19 AM
To: user@ambari.apache.org
Subject: Re: metrics visible from host_component but not component


OK, everybody ready for this explanation?

This boneheaded programmer finally realized today that the timestamps in my metrics didn't
quite look the same as the timestamps in the various examples I've looked at or in other services.

I'm using a python program to push the metrics to the collector, and using the time.time()
function to get the current time. This time, of course, is in seconds since the epoch.

AMS is expecting metrics from HadoopMetricsSink-derived Java classes, which uses Java's System.currentTimeMillis()
method, which, of course, is in milliseconds since the epoch.

So, I changed my python program to use int(time.time()*1000.0) instead of just int(time.time())
and everything magically starting working.

For completness' sake, I should add that I also changed the service's "timelineAppid" in metainfo.xml
to use lower case "gpfs" instead of "GPFS" to be consistent with the lower case appId used
everywhere else. And I also added "gpfs" to timeline.metrics.service.cluster.aggregator.appIds.

I don't know if either of these changes were necessary to get component-level metrics working
properly.

Nate Falk
nfalk@us.ibm.com

[Inactive hide details for Nathan Falk---11/10/2015 11:37:38 AM---I have a custom Ambari service,
with a metrics.json and widget]Nathan Falk---11/10/2015 11:37:38 AM---I have a custom Ambari
service, with a metrics.json and widgets.json defined.

From: Nathan Falk/Poughkeepsie/IBM@IBMUS
To: user@ambari.apache.org
Date: 11/10/2015 11:37 AM
Subject: metrics visible from host_component but not component

________________________________



I have a custom Ambari service, with a metrics.json and widgets.json defined.

The widgets display on the service dashboard summary page, but instead of the graph or data,
I see "n/a".

When I use the REST API to query the ambari server, I see the metrics for the host_component,
but not when I query the component.

In metrics.json, I've added some of the basic ams host metrics, plus some service-specific
metrics. All metrics are defined in both "Component" and "HostComponent". As an example:

{
 "GPFS_MASTER": {
   "Component": [
     {
       "type": "ganglia",
       "metrics": {
         "default": {
           "metrics/cpu/cpu_idle":{
             "metric":"cpu_idle",
             "pointInTime":true,
             "temporal":true,
             "amsHostMetric":true
           },
           ...
           "metrics/gpfs/disk_used": {
             "metric": "gpfs.disk_used",
             "pointInTime": true,
             "temporal": true
           },
           ...
         }
       }
     }
   ],
   "HostComponent": [
     {
       "type": "ganglia",
       "metrics": {
         "default": {
           "metrics/cpu/cpu_idle":{
             "metric":"cpu_idle",
             "pointInTime":true,
             "temporal":true,
             "amsHostMetric":true
           },
           ...
           "metrics/gpfs/disk_used": {
             "metric": "gpfs.disk_used",
             "pointInTime": true,
             "temporal": true
           },
           ...


I query the AMS Collector, and it seems that the metrics are there:
[root@dn01-dat nathan]# curl -X GET -u admin:admin "http://dn01:6188/ws/v1/timeline/metrics?metricNames=gpfs.disk_used&hostname=dn01-dat.ibm.com"
{"metrics":[{"timestamp":1447084964323,"metricname":"gpfs.disk_used","appid":"gpfs","hostname":"dn01-dat.ibm.com","starttime":1447084964,"metrics":{"1447084964":1437696.0}}]}


I query Ambari, and whether I see the metric or not depends on how I do the query. If I query
the GPFS_MASTER service component, I do NOT see the metric:
[root@dn01-dat nathan]# curl -X GET -u admin:admin "http://dn01:8080/api/v1/clusters/nate/services/GPFS/components/GPFS_MASTER?fields=metrics/gpfs/disk_used"
{
 "href" : "http://dn01:8080/api/v1/clusters/nate/services/GPFS/components/GPFS_MASTER?fields=metrics/gpfs/disk_used",
 "ServiceComponentInfo" : {
   "cluster_name" : "nate",
   "component_name" : "GPFS_MASTER",
   "service_name" : "GPFS"
 }
}

If I query the GPFS_MASTER host component on dn01, then I do see the metric:
[root@dn01-dat nathan]# curl -X GET -u admin:admin "http://dn01:8080/api/v1/clusters/nate/hosts/dn01-dat.ibm.com/host_components/GPFS_MASTER?fields=metrics/gpfs/disk_used"
{
 "href" : "http://dn01:8080/api/v1/clusters/nate/hosts/dn01-dat.ibm.com/host_components/GPFS_MASTER?fields=metrics/gpfs/disk_used",
 "HostRoles" : {
   "cluster_name" : "nate",
   "component_name" : "GPFS_MASTER",
   "host_name" : "dn01-dat.ibm.com"
 },
 "host" : {
   "href" : "http://dn01:8080/api/v1/clusters/nate/hosts/dn01-dat.ibm.com"
 },
 "metrics" : {
   "gpfs" : {
     "disk_used" : 1437696.0
   }
 }
}

By comparison, if I query the "cpu_idle" metric, also defined in the GPFS metrics.json file,
I see the metric in both queries:
[root@dn01-dat nathan]# curl -X GET -u admin:admin "http://dn01:8080/api/v1/clusters/nate/services/GPFS/components/GPFS_MASTER?fields=metrics/cpu/cpu_idle"
{
 "href" : "http://dn01:8080/api/v1/clusters/nate/services/GPFS/components/GPFS_MASTER?fields=metrics/cpu/cpu_idle",
 "ServiceComponentInfo" : {
   "cluster_name" : "nate",
   "component_name" : "GPFS_MASTER",
   "service_name" : "GPFS"
 },
 "metrics" : {
   "cpu" : {
     "cpu_idle" : 0.6248046875
   }
 }
}[root@dn01-dat nathan]#
[root@dn01-dat nathan]# curl -X GET -u admin:admin "http://dn01:8080/api/v1/clusters/nate/hosts/dn01-dat.ibm.com/host_components/GPFS_MASTER?fields=metrics/cpu/cpu_idle"
{
 "href" : "http://dn01:8080/api/v1/clusters/nate/hosts/dn01-dat.ibm.com/host_components/GPFS_MASTER?fields=metrics/cpu/cpu_idle",
 "HostRoles" : {
   "cluster_name" : "nate",
   "component_name" : "GPFS_MASTER",
   "host_name" : "dn01-dat.ibm.com"
 },
 "host" : {
   "href" : "http://dn01:8080/api/v1/clusters/nate/hosts/dn01-dat.ibm.com"
 },
 "metrics" : {
   "cpu" : {
     "cpu_idle" : 0.624375
   }
 }
}

I feel like getting back "n/a" on the widgets is related to not seeing the metrics when I
query the component rather than the host_component, but I'm not 100% sure about that either.

My problems don't seem to end there, either. When I create new widgets using the gpfs metrics,
I start seeing some wildly inconsistent behavior. Sometimes I'll get the right metric data,
sometimes as I add and remove widgets they'll go back to displaying n/a or even displaying
old values for the metric data.

I must be missing something really simple, but I think I'm going to need help to figure out
what that might be.

Does anyone out there have any suggestions for how to investigate this further or what I might
be missing with regard to defining or posting these metrics?

Thanks,

Nate Falk
nfalk@us.ibm.com



Mime
View raw message