hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Hsieh (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HBASE-7958) Statistics per-column family per-region
Date Fri, 03 May 2013 22:18:16 GMT

     [ https://issues.apache.org/jira/browse/HBASE-7958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jonathan Hsieh updated HBASE-7958:
----------------------------------

    Fix Version/s:     (was: 0.95.1)
    
> Statistics per-column family per-region
> ---------------------------------------
>
>                 Key: HBASE-7958
>                 URL: https://issues.apache.org/jira/browse/HBASE-7958
>             Project: HBase
>          Issue Type: New Feature
>    Affects Versions: 0.95.2
>            Reporter: Jesse Yates
>            Assignee: Jesse Yates
>         Attachments: hbase-7958_rough-cut-v0.patch, hbase-7958-v0-parent.patch, hbase-7958-v0.patch
>
>
> Originating from this discussion on the dev list: http://search-hadoop.com/m/coDKU1urovS/Simple+stastics+per+region/v=plain
> Essentially, we should have built-in statistics gathering for HBase tables. This allows
clients to have a better understanding of the distribution of keys within a table and a given
region. We could also surface this information via the UI.
> There are a couple different proposals from the email, the overview is this:
> We add in something on compactions that gathers stats about the keys that are written
and then we surface them to a table.
> The possible proposals include:
> *How to implement it?*
> # Coprocessors - 
> ** advantage - it easily plugs in and people could pretty easily add their own statistics.

> ** disadvantage - UI elements would also require this, we get into dependent loading,
which leads down the OSGi path. Also, these CPs need to be installed _after_ all the other
CPs on compaction to ensure they see exactly what gets written (doable, but a pain)
> # Built into HBase as a custom scanner
> ** advantage - always goes in the right place and no need to muck about with loading
CPs etc.
> ** disadvantage - less pluggable, at least for the initial cut
> *Where do we store data?*
> # .META.
> ** advantage - its an existing table, so we can jam it into another CF there
> ** disadvantage - this would make META much larger, possibly leading to splits AND will
make it much harder for other processes to read the info
> # A new stats table
> ** advantage - cleanly separates out the information from META
> ** disadvantage - should use a 'system table' idea to prevent accidental deletion, manipulation
by arbitrary clients, but still allow clients to read it.
> Once we have this framework, we can then move to an actual implementation of various
statistics.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message