chukwa-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eric Yang (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (CHUKWA-700) Revisit Chukwa metrics schema design for HBase
Date Sat, 02 Nov 2013 21:09:19 GMT

     [ https://issues.apache.org/jira/browse/CHUKWA-700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Eric Yang updated CHUKWA-700:
-----------------------------

    Description: 
Current Chukwa HBase schema looks like this:

{code}
<timestamp>-<primaryKey>   <columnFamily>:<cell>...
{code}

Monotonic increasing timestamp can not evenly distribute across region servers without special
handle and care periodically.

It is time to revise the schema, and proposed schema looks like this:

{code}
<hhddmmyyyy>-<primaryId>  cf:<cell>...
{code}

Timestamp is stored with cell, row key helps to split data by hour, and a full hour of metrics
is stored on the same row.  PrimaryKey is replaced with hash id of the primary key.  Metrics
tables to aggregate metrics:

chukwaMetrics -> chukwaMetricsMonthly -> chukwaMetricsYearly

  was:
Current Chukwa HBase schema looks like this:

{code}
                                                <columnFamily>
<timestamp>-<primaryKey>   <cell>...
{code}

Monotonic increasing timestamp can not evenly distribute across region servers without special
handle and care periodically.

It is time to revise the schema, and proposed schema looks like this:

{code}
                                                <cf>
<hhddmmyyyy>-<primaryId>  <cell>...
{code}

Timestamp is stored with cell, row key helps to split data by hour, and a full hour of metrics
is stored on the same row.  PrimaryKey is replaced with hash id of the primary key.  Metrics
tables to aggregate metrics:

chukwaMetrics -> chukwaMetricsMonthly -> chukwaMetricsYearly


> Revisit Chukwa metrics schema design for HBase
> ----------------------------------------------
>
>                 Key: CHUKWA-700
>                 URL: https://issues.apache.org/jira/browse/CHUKWA-700
>             Project: Chukwa
>          Issue Type: Bug
>          Components: Data Collection
>    Affects Versions: 0.6.0
>         Environment: MacOSX, Java
>            Reporter: Eric Yang
>
> Current Chukwa HBase schema looks like this:
> {code}
> <timestamp>-<primaryKey>   <columnFamily>:<cell>...
> {code}
> Monotonic increasing timestamp can not evenly distribute across region servers without
special handle and care periodically.
> It is time to revise the schema, and proposed schema looks like this:
> {code}
> <hhddmmyyyy>-<primaryId>  cf:<cell>...
> {code}
> Timestamp is stored with cell, row key helps to split data by hour, and a full hour of
metrics is stored on the same row.  PrimaryKey is replaced with hash id of the primary key.
 Metrics tables to aggregate metrics:
> chukwaMetrics -> chukwaMetricsMonthly -> chukwaMetricsYearly



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message