chukwa-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eric Yang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CHUKWA-667) Optimize the HBase schema for Ganglia queris
Date Sun, 12 Apr 2015 18:48:12 GMT

    [ https://issues.apache.org/jira/browse/CHUKWA-667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14491643#comment-14491643
] 

Eric Yang commented on CHUKWA-667:
----------------------------------

Hi Sreepathi,

Metrics for the whole day will update the same row.  However, row is just a reference pointer
to the actual data block.  This reduces the number of lookup to the data block.  Cell appends
to the new data in memory or WAL log and spill to disk during compaction.  This design reduces
the stress point of monotonic increasing index.  It will reach optimal balanced regions after
1 year of running because we partition by day.  Partition by numeric number is better than
metric group prefix because metric group prefix can generate uneven size of regions because
some metric group contains more metrics than others.  For this reason, the design added day
as prefix of the row key.

> Optimize the HBase schema for Ganglia queris
> --------------------------------------------
>
>                 Key: CHUKWA-667
>                 URL: https://issues.apache.org/jira/browse/CHUKWA-667
>             Project: Chukwa
>          Issue Type: New Feature
>          Components: Data Processors
>    Affects Versions: 0.6.0
>            Reporter: Saisai Shao
>             Fix For: 0.7.0
>
>         Attachments: CHUKWA-667.patch
>
>
> Chukwa HBase table schema is designed for HICC, it cannot be fully adapted to Ganglia
web frontend for several reasons:
> (1) cannot fastly retrieve all the cluster and related host names.
> (2) system metrics have no attributes, like type, unit, so it is hard to explain the
collected metrics by code.
> (3) lack of data cosolidate function, choosing metric for a large time range (like 30
days) will fetch all the data and draw graph, which will largely lose performance.
> We will redesign the table schema that will be better adapted to Ganglia web frontend
queries.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message