hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Billy Pearson (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-678) hbase needs a 'safe-mode'
Date Mon, 08 Sep 2008 04:54:46 GMT

    [ https://issues.apache.org/jira/browse/HBASE-678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12629069#action_12629069
] 

Billy Pearson commented on HBASE-678:
-------------------------------------

I thank we should have a multi stage/process safe mode on the master not just for the clients
but to handle crash recovery of regions and region balancing while we are in safe mode.
we have a issue for helping the balancing out on HBASE-862 but I thank it will still be helpful
to include all start up balancing while in safe mode

Assuming we do not run just queue up needed compaction/split checks on loading regions while
in safe mode.
Stage 1: Deploy all regions
Stage 2:  Do any crash recovery needed and do a flush to get that to disk (remove recovery
logs on success flush)
Stage 3: Do any balancing of the regions before exiting the safe mode if needed.

Stage 3 is there so we do not have any compactions or splits running on the regions and we
can move them around as we need to to balance the region count out. 
If there is no compactions running closes happen immediately.

I seen some re balancing happen on start up and the region servers go crazy trying to balance
as Daniel commented above. 
This in my cluster is mostly from regions closing having to wait for running compaction creating
a lag in the balancing counts
When the compactions finish and the region get closed and redeploy the counts are all out
of balance again and the same thing happens over and over until almost all the compactions
are done
and the regions can close and redeploy with out lag of the compactions.
Once we have done the above all will be ready for the clients to connect to the cluster with
out having to worry about churn in balancing or crash recovering regions.

Daniel: If we block region balancing while in Safe Mode your clients can connect when we come
out of safe mode but then balancing will kick in and you will see the same churn as we have
now.

> hbase needs a 'safe-mode'
> -------------------------
>
>                 Key: HBASE-678
>                 URL: https://issues.apache.org/jira/browse/HBASE-678
>             Project: Hadoop HBase
>          Issue Type: Improvement
>            Reporter: stack
>            Assignee: Jim Kellerman
>            Priority: Critical
>             Fix For: 0.19.0
>
>
> Internally we have a cluster of thousands of regions.  We just did a hbase restart w/
master on new node.  Just so happened that one of the regionservers was running extra slow
(was downloaded by other processes).  Meant that its portion of the assigments was taking
a long time to come up...  While these regions were stuck in deploy mode, the cluster is not
useable.
> We need a sort of 'safe-mode' in hbase where clients fail if they try to attach to a
cluster not yet fully up.  UI should show when all assignments have been successfully made
so admin can at least see when they have a problematic regionserver in their midst.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message