Return-Path: X-Original-To: apmail-chukwa-dev-archive@www.apache.org Delivered-To: apmail-chukwa-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B3FD317378 for ; Sun, 12 Apr 2015 18:48:12 +0000 (UTC) Received: (qmail 39933 invoked by uid 500); 12 Apr 2015 18:48:12 -0000 Delivered-To: apmail-chukwa-dev-archive@chukwa.apache.org Received: (qmail 39900 invoked by uid 500); 12 Apr 2015 18:48:12 -0000 Mailing-List: contact dev-help@chukwa.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@chukwa.apache.org Delivered-To: mailing list dev@chukwa.apache.org Received: (qmail 39887 invoked by uid 99); 12 Apr 2015 18:48:12 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 12 Apr 2015 18:48:12 +0000 Date: Sun, 12 Apr 2015 18:48:12 +0000 (UTC) From: "Eric Yang (JIRA)" To: dev@chukwa.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (CHUKWA-667) Optimize the HBase schema for Ganglia queris MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CHUKWA-667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14491643#comment-14491643 ] Eric Yang commented on CHUKWA-667: ---------------------------------- Hi Sreepathi, Metrics for the whole day will update the same row. However, row is just a reference pointer to the actual data block. This reduces the number of lookup to the data block. Cell appends to the new data in memory or WAL log and spill to disk during compaction. This design reduces the stress point of monotonic increasing index. It will reach optimal balanced regions after 1 year of running because we partition by day. Partition by numeric number is better than metric group prefix because metric group prefix can generate uneven size of regions because some metric group contains more metrics than others. For this reason, the design added day as prefix of the row key. > Optimize the HBase schema for Ganglia queris > -------------------------------------------- > > Key: CHUKWA-667 > URL: https://issues.apache.org/jira/browse/CHUKWA-667 > Project: Chukwa > Issue Type: New Feature > Components: Data Processors > Affects Versions: 0.6.0 > Reporter: Saisai Shao > Fix For: 0.7.0 > > Attachments: CHUKWA-667.patch > > > Chukwa HBase table schema is designed for HICC, it cannot be fully adapted to Ganglia web frontend for several reasons: > (1) cannot fastly retrieve all the cluster and related host names. > (2) system metrics have no attributes, like type, unit, so it is hard to explain the collected metrics by code. > (3) lack of data cosolidate function, choosing metric for a large time range (like 30 days) will fetch all the data and draw graph, which will largely lose performance. > We will redesign the table schema that will be better adapted to Ganglia web frontend queries. -- This message was sent by Atlassian JIRA (v6.3.4#6332)