hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jesse Yates (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-7958) Statistics per-column family per-region
Date Wed, 06 Mar 2013 21:22:13 GMT

    [ https://issues.apache.org/jira/browse/HBASE-7958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13595132#comment-13595132

Jesse Yates commented on HBASE-7958:

So it looks like there is a desire for a pretty large range of possible statistics. I'd rather
we don't get bogged down in what specific statistics we want, but push more towards a design
discussion around enabling people to capture these statistics. We know we want them, the question
is how :)

Once we have the mechanisms in place to read/write a stats table for an individual stat, we
can much more easily expand that support stats at different tie-in places. The 'at compaction
time histogram' seemed like an easy enough starting place for _one type of stat_, but that
should not necessarily limit possible stats that can be collected; its an immediate use-case
for a general statistics table.

Stepping back, it seems to me that we can have a basic set of statistics that you can enable
for a table at creation time (or even turn it on later too). We then also need a mechanism
to let people add their own statistics easily (thinking a CP hook here). From there, we just
need to have an mechanism to make it easy to access each statistic.

I don't think any of the above proposals really changes my proposed outline-patch besides
making it easy(easier?) to hook in custom stat implementations, a clean dynamic loading mechanism
(from the various //TODOs for CP hooks), and a little more utility in the StatisticsTable
class to make it easy to read a stat.

Sound reasonable?
> Statistics per-column family per-region
> ---------------------------------------
>                 Key: HBASE-7958
>                 URL: https://issues.apache.org/jira/browse/HBASE-7958
>             Project: HBase
>          Issue Type: New Feature
>    Affects Versions: 0.96.0
>            Reporter: Jesse Yates
>            Assignee: Jesse Yates
>             Fix For: 0.96.0
>         Attachments: hbase-7958_rough-cut-v0.patch
> Originating from this discussion on the dev list: http://search-hadoop.com/m/coDKU1urovS/Simple+stastics+per+region/v=plain
> Essentially, we should have built-in statistics gathering for HBase tables. This allows
clients to have a better understanding of the distribution of keys within a table and a given
region. We could also surface this information via the UI.
> There are a couple different proposals from the email, the overview is this:
> We add in something on compactions that gathers stats about the keys that are written
and then we surface them to a table.
> The possible proposals include:
> *How to implement it?*
> # Coprocessors - 
> ** advantage - it easily plugs in and people could pretty easily add their own statistics.

> ** disadvantage - UI elements would also require this, we get into dependent loading,
which leads down the OSGi path. Also, these CPs need to be installed _after_ all the other
CPs on compaction to ensure they see exactly what gets written (doable, but a pain)
> # Built into HBase as a custom scanner
> ** advantage - always goes in the right place and no need to muck about with loading
CPs etc.
> ** disadvantage - less pluggable, at least for the initial cut
> *Where do we store data?*
> # .META.
> ** advantage - its an existing table, so we can jam it into another CF there
> ** disadvantage - this would make META much larger, possibly leading to splits AND will
make it much harder for other processes to read the info
> # A new stats table
> ** advantage - cleanly separates out the information from META
> ** disadvantage - should use a 'system table' idea to prevent accidental deletion, manipulation
by arbitrary clients, but still allow clients to read it.
> Once we have this framework, we can then move to an actual implementation of various

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message