Return-Path: X-Original-To: apmail-chukwa-dev-archive@www.apache.org Delivered-To: apmail-chukwa-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A579510646 for ; Fri, 8 Nov 2013 02:16:19 +0000 (UTC) Received: (qmail 30282 invoked by uid 500); 8 Nov 2013 02:16:19 -0000 Delivered-To: apmail-chukwa-dev-archive@chukwa.apache.org Received: (qmail 30189 invoked by uid 500); 8 Nov 2013 02:16:18 -0000 Mailing-List: contact dev-help@chukwa.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@chukwa.apache.org Delivered-To: mailing list dev@chukwa.apache.org Received: (qmail 30176 invoked by uid 500); 8 Nov 2013 02:16:18 -0000 Delivered-To: apmail-incubator-chukwa-dev@incubator.apache.org Received: (qmail 30172 invoked by uid 99); 8 Nov 2013 02:16:18 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 08 Nov 2013 02:16:18 +0000 Date: Fri, 8 Nov 2013 02:16:18 +0000 (UTC) From: "shreyas subramanya (JIRA)" To: chukwa-dev@incubator.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (CHUKWA-700) Revisit Chukwa metrics schema design for HBase MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CHUKWA-700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13816928#comment-13816928 ] shreyas subramanya commented on CHUKWA-700: ------------------------------------------- I think we need to provide support for multiple custom schema, since the performance would depend on the actual scan scenario (as observed in Chukwa-667). Just like we can configure the hbase table and column family in demux processor, we need to provide annotation for OutputCollector, so that the demux processor can dictate the hbase schema. > Revisit Chukwa metrics schema design for HBase > ---------------------------------------------- > > Key: CHUKWA-700 > URL: https://issues.apache.org/jira/browse/CHUKWA-700 > Project: Chukwa > Issue Type: Bug > Components: Data Collection > Affects Versions: 0.6.0 > Environment: MacOSX, Java > Reporter: Eric Yang > > Current Chukwa HBase schema looks like this: > {code} > - :... > {code} > Monotonic increasing timestamp can not evenly distribute across region servers without special handle and care periodically. > It is time to revise the schema, and proposed schema looks like this: > {code} > - cf:... > {code} > Timestamp is stored with cell, row key helps to split data by hour, and a full hour of metrics is stored on the same row. PrimaryKey is replaced with hash id of the primary key. Metrics tables to aggregate metrics: > chukwaMetrics -> chukwaMetricsMonthly -> chukwaMetricsYearly -- This message was sent by Atlassian JIRA (v6.1#6144)