hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-4354) track region history
Date Fri, 09 Sep 2011 03:17:10 GMT

    [ https://issues.apache.org/jira/browse/HBASE-4354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13100910#comment-13100910

stack commented on HBASE-4354:

A long time ago we had a history column in .META.  It tried to note each transition a region
went through.  The history of a region was kept in its row up in .META.  It was a kinda nice
feature.  It was also a super pain at the same time.  We had lots of issues around regionservers
trying to update history in .META. though .META. was gone and we couldn't do full history;
e.g. the close of a region on a cluster shutdown.  There may have been deadlocks too around
updating history while trying to do edits in .META. but my memory may not be serving me right
here.  In the end we stripped the feature out because it was more trouble that it was worth.

That said, I think this would be good to have.  The natural place to do this stuff would be
in a table inside hbase I'd think.  But then what to do if this table is not online or if
we are shutting down the cluster and you want to log region close?

> track region history
> --------------------
>                 Key: HBASE-4354
>                 URL: https://issues.apache.org/jira/browse/HBASE-4354
>             Project: HBase
>          Issue Type: New Feature
>          Components: master, metrics, regionserver
>            Reporter: Ming Ma
>            Assignee: Ming Ma
> For debugging and analysis purposes it will be useful to understand regions' lifecycle,
how it is created ( from which parent region, for example), how it is splitted, assigned,
etc. Some of these info are in the logs, hbase .META. table, zookeeper, metrics. Certain history
data is lost; for example, the states will be removed from zookeeper /hbase/unassigned once
the region is assigned; also .META. table has max version of 10 thus only tracks the last
10 RS assignments of a given region. It will be nice to put it a central place. It can provide:
> 1. How applications use hbase. For example, it might create large number of regions in
a short period of time and drop the table later.
> 2. How HBase internally manage regions such as how regions are splitted, assigned, turned
offline, etc.
> Things to track
> 1. How it is created, parent region in the case of split.
> 2. Region tranisition process such as region state change, region server change.
> One idea is to put such transition history data to zookeeper. One issue is it could blow
up zookeeper memory if we have large number of regions and the cluster runs for a long time.
I would like to get your feedback on different approaches to address the issue. One assumption
is region assignment doesn't happen with high frequency and thus the overhead introduced won't
have much impact on the system performance.
> Approach 1:
> Zookeeper knows the history of how /hbase/unassigned is modified, if we can get zookeeper's
logs (Bookkeeper ? ) somehow, we know the history of region transition.
> Approach 2:
> 1.	HBase logs extra region transition data to zookeeper. It could be one zookeeper node
per transaction.
> 2.	Have a separate thread on the Master to move data from zookeeper and append to HDFS.
That will keep the zookeeper size in check.
> 3.	Have some tool or web UI to show the history of a given region by looking at zookeeper
and HDFS.

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message