Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 1E69D113F3 for ; Sun, 11 May 2014 03:48:58 +0000 (UTC) Received: (qmail 38820 invoked by uid 500); 11 May 2014 03:22:17 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 38775 invoked by uid 500); 11 May 2014 03:22:17 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 38767 invoked by uid 99); 11 May 2014 03:22:17 -0000 Received: from Unknown (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 11 May 2014 03:22:17 +0000 Date: Sun, 11 May 2014 03:22:17 +0000 (UTC) From: "Andrew Purtell (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HBASE-7958) Statistics per-column family per-region MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-7958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13994412#comment-13994412 ] Andrew Purtell commented on HBASE-7958: --------------------------------------- Thinking about reviving this issue. [~jesse_yates], could you comment on why this fizzled? > Statistics per-column family per-region > --------------------------------------- > > Key: HBASE-7958 > URL: https://issues.apache.org/jira/browse/HBASE-7958 > Project: HBase > Issue Type: New Feature > Affects Versions: 0.95.2 > Reporter: Jesse Yates > Assignee: Jesse Yates > Attachments: hbase-7958-v0-parent.patch, hbase-7958-v0.patch, hbase-7958_rough-cut-v0.patch > > > Originating from this discussion on the dev list: http://search-hadoop.com/m/coDKU1urovS/Simple+stastics+per+region/v=plain > Essentially, we should have built-in statistics gathering for HBase tables. This allows clients to have a better understanding of the distribution of keys within a table and a given region. We could also surface this information via the UI. > There are a couple different proposals from the email, the overview is this: > We add in something on compactions that gathers stats about the keys that are written and then we surface them to a table. > The possible proposals include: > *How to implement it?* > # Coprocessors - > ** advantage - it easily plugs in and people could pretty easily add their own statistics. > ** disadvantage - UI elements would also require this, we get into dependent loading, which leads down the OSGi path. Also, these CPs need to be installed _after_ all the other CPs on compaction to ensure they see exactly what gets written (doable, but a pain) > # Built into HBase as a custom scanner > ** advantage - always goes in the right place and no need to muck about with loading CPs etc. > ** disadvantage - less pluggable, at least for the initial cut > *Where do we store data?* > # .META. > ** advantage - its an existing table, so we can jam it into another CF there > ** disadvantage - this would make META much larger, possibly leading to splits AND will make it much harder for other processes to read the info > # A new stats table > ** advantage - cleanly separates out the information from META > ** disadvantage - should use a 'system table' idea to prevent accidental deletion, manipulation by arbitrary clients, but still allow clients to read it. > Once we have this framework, we can then move to an actual implementation of various statistics. -- This message was sent by Atlassian JIRA (v6.2#6252)