chukwa-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sreepathi Prasanna (JIRA)" <>
Subject [jira] [Commented] (CHUKWA-667) Optimize the HBase schema for Ganglia queris
Date Sun, 12 Apr 2015 08:15:12 GMT


Sreepathi Prasanna commented on CHUKWA-667:

I like the idea of separating the metadata and the metrics data itself into two different
tables. this saves lot of space. 

Regarding row key:

day+6 digits of md5(metricgroup+metric)+6 digits of md5(host) 

I kind of agree to this design, but had a question. Does that mean all metrics collected per
minute on the same day would hit the same row? is it performant? Also if you are aggregating
the data every 15 mins, wouldn't that cause load on the same rows where writes are happening?

> Optimize the HBase schema for Ganglia queris
> --------------------------------------------
>                 Key: CHUKWA-667
>                 URL:
>             Project: Chukwa
>          Issue Type: New Feature
>          Components: Data Processors
>    Affects Versions: 0.6.0
>            Reporter: Saisai Shao
>             Fix For: 0.7.0
>         Attachments: CHUKWA-667.patch
> Chukwa HBase table schema is designed for HICC, it cannot be fully adapted to Ganglia
web frontend for several reasons:
> (1) cannot fastly retrieve all the cluster and related host names.
> (2) system metrics have no attributes, like type, unit, so it is hard to explain the
collected metrics by code.
> (3) lack of data cosolidate function, choosing metric for a large time range (like 30
days) will fetch all the data and draw graph, which will largely lose performance.
> We will redesign the table schema that will be better adapted to Ganglia web frontend

This message was sent by Atlassian JIRA

View raw message