ambari-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tom Beerbower (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (AMBARI-1044) API is not returning Ganglia metrics for one of the hosts in the cluster
Date Fri, 30 Nov 2012 17:11:57 GMT

    [ https://issues.apache.org/jira/browse/AMBARI-1044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13507459#comment-13507459
] 

Tom Beerbower commented on AMBARI-1044:
---------------------------------------

I don't see any related exceptions in the server log which means that either its not attempting
to get the metrics for this host or they are just not being set on the host resource.

I think that I see what is happening. One of the arguments that can be specified for the rrd
query is the Ganglia cluster (HDPHBaseMaster, HDPJobTracker, HDPNameNode or HDPSlaves). The
question is, for a host level query which Ganglia cluster should we specify?

Its hard to say since a host isn't necessarily with any of the services related to those clusters...
or maybe more than one. It turns out it doesn't really matter. In this case I can see the
system level rrd files that we use for host level metrics for ip-10-224-42-108.ec2.internal
under any of the Ganglia cluster folders. For example ...
{code}
[root@ip-10-40-91-121 rrds]# ls ./HDPHBaseMaster/ip-10-224-42-108.ec2.internal
boottime.rrd  bytes_out.rrd  cpu_idle.rrd  cpu_num.rrd    cpu_system.rrd  cpu_wio.rrd    disk_total.rrd
   load_five.rrd  mem_buffers.rrd  mem_free.rrd    mem_total.rrd      pkts_in.rrd   proc_run.rrd
   swap_free.rrd
bytes_in.rrd  cpu_aidle.rrd  cpu_nice.rrd  cpu_speed.rrd  cpu_user.rrd    disk_free.rrd  load_fifteen.rrd
 load_one.rrd   mem_cached.rrd   mem_shared.rrd  part_max_used.rrd  pkts_out.rrd  proc_total.rrd
 swap_total.rrd

...

[root@ip-10-40-91-121 rrds]# ls HDPNameNode/ip-10-224-42-108.ec2.internal
boottime.rrd  bytes_out.rrd  cpu_idle.rrd  cpu_num.rrd    cpu_system.rrd  cpu_wio.rrd    disk_total.rrd
   load_five.rrd  mem_buffers.rrd  mem_free.rrd    mem_total.rrd      pkts_in.rrd   proc_run.rrd
   swap_free.rrd
bytes_in.rrd  cpu_aidle.rrd  cpu_nice.rrd  cpu_speed.rrd  cpu_user.rrd    disk_free.rrd  load_fifteen.rrd
 load_one.rrd   mem_cached.rrd   mem_shared.rrd  part_max_used.rrd  pkts_out.rrd  proc_total.rrd
 swap_total.rrd
{code}
The approach that I've been using is to look through the host components for the host that
we are interested in and try to map one of its component names back to a Ganglia cluster.
In this case it looks like the host with the missing metrics is not associated with any component
that would map back given the mapping method that I am using.

Given what I am currently seeing with the system level metrics, I think that it would be safe
to simply use HDPSlaves as the Ganglia cluster for host level queries.
                
> API is not returning Ganglia metrics for one of the hosts in the cluster
> ------------------------------------------------------------------------
>
>                 Key: AMBARI-1044
>                 URL: https://issues.apache.org/jira/browse/AMBARI-1044
>             Project: Ambari
>          Issue Type: Sub-task
>            Reporter: Tom Beerbower
>            Assignee: Tom Beerbower
>
> A cluster was deployed with 4 hosts, with Ambari Server running on a different host.
> Host graphs are showing for 3 of the hosts.
> For one of the hosts, API is not returning any temporal data.
> Ganglia is showing host-level metrics.
> UI: http://ec2-54-242-174-25.compute-1.amazonaws.com:8080/#/main/hosts/ip-10-224-42-108.ec2.internal/summary
> Ganglia UI: http://ec2-174-129-70-110.compute-1.amazonaws.com/ganglia/mobile_helper.php?show_host_metrics=1&h=ip-10-224-42-108.ec2.internal&c=HDPNameNode&r=hour&cs=&ce=
> API response:
> {
> "href" : "http://ec2-54-242-174-25.compute-1.amazonaws.com:8080/api/v1/clusters/C2/hosts/ip-10-224-42-108.ec2.internal?fields=metrics/cpu/cpu_user1354227417,1354231017,15,metrics/cpu/cpu_wio1354227417,1354231017,15,metrics/cpu/cpu_nice1354227417,1354231017,15,metrics/cpu/cpu_aidle1354227417,1354231017,15,metrics/cpu/cpu_system1354227417,1354231017,15,metrics/cpu/cpu_idle1354227417,1354231017,15",
> "Hosts" :
> { "cluster_name" : "C2", "host_name" : "ip-10-224-42-108.ec2.internal" }
> }
> We need to understand the root cause.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message