ambari-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Siddharth Wagle (JIRA)" <>
Subject [jira] [Comment Edited] (AMBARI-5707) Replace Ganglia with high performant and pluggable Metrics System
Date Wed, 17 Sep 2014 17:01:33 GMT


Siddharth Wagle edited comment on AMBARI-5707 at 9/17/14 5:01 PM:

*Revised architecture overview*:

*Problems with current system*:
- Ganglia has limited capabilities for analyzing historic data, new plugins are not easy to
- Horizontal scale out for large clusters.
- No support for adhoc queries.
- Not easy to add metrics support for new services added to the stack.
- It is non trivial to hook up existing time series databases like OpenTSDB to store raw data

- Replace Ganglia with bespoke solution based on an embedded HBase to fit all needs.
- Ability to store fine-grained data for a configurable amount of time.
- Ability to write SQL (via Phoenix) like queries on aggregated metric data sets and visualize
the results.
- Provide pluggable storage API with ability to forward metric data to external long-term
- Ability to add user defined metrics and visualize them through the Ambari Views.

*Component description*:

- *Host metrics monitor*:
A lightweight python process running on every managed host and collecting metrics for the
managed processes running on the host in addition to aggregate metrics for the entire host.
The collected metrics will be pushed to a pre-configured metric collector to be stored for
consumption by the Ambari API.

- *Hadoop Metrics Sink*:
Implementation of Hadoop Metrics Sink interface to pushed data to a configured collector.
As a part of the Hadoop Metric Sink implementation, allow a periodic flush of collected metrics
data, the _putMetric()_ should write data into a Bounded Buffer cache with a fixed size, configurable
through the

- *Timeline Metrics Collector*:
Metrics collector is daemon that receives data from registered publishers and provides ability
to push the metrics data to an external metric storage like OpenTSDB or HDFS along with pushing
data to a local metrics store. Additionally, the metrics collector provides ability to plugin
aggregators for the collected metric data. The aggregation is performed post-write by aggregator
threads running with a configured time interval and aggregating data collected within that

- *Timeline Metrics Store*:
A time series database is ideal for storing metrics data. The main advantage is variable time
buckets, for example a row key indicating a metric id followed by an arbitrary number of key
value pairs that fit into the time range identified by a part of the key. This storage model
allows simple time based aggregation and avoids sparse rows. The deployment modes of HBASE
allow for scaling up and down based on cluster size. Also, the choice of HBASE as default
storage allows storage to scale independently and seamlessly from the Metric Collectors. Phoenix's
SQL - Phoenix provides JDBC APIs instead of the regular HBase client APIs to create tables,
insert data, and query your HBase data.

- *Ambari Metrics Service*:
The API design for the Metrics Service should support GET API using key and time range similar
what exists on the HBASE cluster.

- *Ambari Views*:
Ambari Views on top of Phoenix provide ad-hoc query capability to the user along with a View
to replace Ganglia Web

> Replace Ganglia with high performant and pluggable Metrics System
> -----------------------------------------------------------------
>                 Key: AMBARI-5707
>                 URL:
>             Project: Ambari
>          Issue Type: Epic
>          Components: ambari-agent, ambari-server
>    Affects Versions: 1.6.0
>            Reporter: Siddharth Wagle
>            Assignee: Siddharth Wagle
>            Priority: Critical
>         Attachments: MetricsArchLatest.png, Revised archtecture diagram.png
> *Ambari Metrics System*
> - Ability to collect metrics from Hadoop and other Stack services
> - Ability to retain metrics at a high precision for a configurable time period (say 5
> - Ability to automatically purge metrics after retention period
> - At collection time, provide clear integration point for external system (such as TSDB)
> - At purge time, provide clear integration point for metrics retention by external system
> - Should provide default options for external metrics retention (say “HDFS”)
> - Provide tools / utilities for analyzing metrics in retention system (say “Hive schema,
Pig scripts, etc” that can be used with the default retention store “HDFS”)
> *System Requirements*
> - Must be portable and platform independent
> - Must not conflict with any existing metrics system (such as Ganglia)
> - Must not conflict with existing SNMP infra
> - Must not run as root
> - Must have HA story (no SPOF)
> *Usage*
> - Ability to obtain metrics from Ambari REST API (point in time and temporal)
> - Ability to view metric graphs in Ambari Web (currently, fixed)
> - Ability to configure custom metric graphs in Ambari Web (currently, we have metric
graphs “fixed” into the UI)
> - Need to improve metric graph “navigation” in Ambari Web (currently, metric graphs
do not allow navigation at arbitrary timeframes, but only at ganglia aggregation intervals)

> - Ability to “view cluster” at point in time (i.e. see all metrics at that point)
> - Ability to define metrics (and how + where to obtain) in Stack Definitions

This message was sent by Atlassian JIRA

View raw message