chukwa-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sreepathi Prasanna (JIRA)" <>
Subject [jira] [Commented] (CHUKWA-700) Revisit Chukwa metrics schema design for HBase
Date Sat, 09 Nov 2013 03:23:17 GMT


Sreepathi Prasanna commented on CHUKWA-700:

I think DataNucleus makes sense if the underlying store could be any DB like, MySQL, Oracle,
DB2, HBase, etc., and AFAIK there is no support for HDFS yet.  Since, we are storing the data
always into HBase or atleast as mentioned in this Jira, using DataNucleus might be overhead.

Using annotation based approach should be okay for now.

> Revisit Chukwa metrics schema design for HBase
> ----------------------------------------------
>                 Key: CHUKWA-700
>                 URL:
>             Project: Chukwa
>          Issue Type: Bug
>          Components: Data Collection
>    Affects Versions: 0.6.0
>         Environment: MacOSX, Java
>            Reporter: Eric Yang
> Current Chukwa HBase schema looks like this:
> {code}
> <timestamp>-<primaryKey>   <columnFamily>:<cell>...
> {code}
> Monotonic increasing timestamp can not evenly distribute across region servers without
special handle and care periodically.
> It is time to revise the schema, and proposed schema looks like this:
> {code}
> <hhddmmyyyy>-<primaryId>  cf:<cell>...
> {code}
> Timestamp is stored with cell, row key helps to split data by hour, and a full hour of
metrics is stored on the same row.  PrimaryKey is replaced with hash id of the primary key.
 Metrics tables to aggregate metrics:
> chukwaMetrics -> chukwaMetricsMonthly -> chukwaMetricsYearly

This message was sent by Atlassian JIRA

View raw message