chukwa-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eric Yang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CHUKWA-667) Optimize the HBase schema for Ganglia queris
Date Sat, 04 Apr 2015 21:34:33 GMT

    [ https://issues.apache.org/jira/browse/CHUKWA-667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14395958#comment-14395958
] 

Eric Yang commented on CHUKWA-667:
----------------------------------

Further enhancement to the primary key section,

The first 6 digits are md5 prefix of hashing group name.  Then follow by 6 digits of hashing
primary key name.

Example of this would be:

Hadoop.dfs.datanode.byteRead:host1.example.com

Where Hadoop.dfs.datanode.byteRead and host1.example.com are collocated primary keys, but
when doing computation aggregating metrics by host, Hadoop.dfs.datanode.byteRead is used to
compute the aggregate.  Hadoop.dfs.datanode.byteRead has somewhat significant value in the
computation.  Hence, we generate the primary key as:

Hadoop.dfs.datanode.byteRead = 21da46
host1.example.com = a026db

For day 269 of the year, the row key would appear as:

26921da46a026db

This enable programmer to customize rowFilter to get either the more significant part of the
primary key or the least significant part of the primary key.  thoughts?

> Optimize the HBase schema for Ganglia queris
> --------------------------------------------
>
>                 Key: CHUKWA-667
>                 URL: https://issues.apache.org/jira/browse/CHUKWA-667
>             Project: Chukwa
>          Issue Type: Sub-task
>          Components: Data Processors
>    Affects Versions: 0.6.0
>            Reporter: Saisai Shao
>
> Chukwa HBase table schema is designed for HICC, it cannot be fully adapted to Ganglia
web frontend for several reasons:
> (1) cannot fastly retrieve all the cluster and related host names.
> (2) system metrics have no attributes, like type, unit, so it is hard to explain the
collected metrics by code.
> (3) lack of data cosolidate function, choosing metric for a large time range (like 30
days) will fetch all the data and draw graph, which will largely lose performance.
> We will redesign the table schema that will be better adapted to Ganglia web frontend
queries.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message