hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stack <st...@duboce.net>
Subject Re: Some problems in one accident on my production cluster
Date Thu, 25 Feb 2016 03:41:54 GMT
On Wed, Feb 24, 2016 at 3:31 PM, Heng Chen <heng.chen.1986@gmail.com> wrote:

> The story is I run one MR job on my production cluster (0.98.6),   it needs
> to scan one table during map procedure.
> Because of the heavy load from the job,  all my RS crashed due to OOM.
Really big rows? If so, can you narrow your scan or ask for partial rows
(IIRC, you can do this in 0.98.x) or move up on to hbase 1.1+ where
scanning does 'chunking'?

> After i restart all RS,  i found one problem.
> All regions were reopened on one RS,

... the others took a while to check in? Thats usual reason one RS gets a
bunch of regions.

> and balancer could not run because of
> two regions were in transition.   The cluster got in stuck a long time
> until i restarted master.
> 1.  why this happened?
> Would need logs. I see you posted some later. Good to go to the server
that was doing the split and look in log around the time of split fail.

> 2.  If cluster has a lots of regions, after all RS crash,  how to restart
> the cluster.  If restart RS one by one, it means OOM may happen because one
> RS has to hold all regions and it will cost a long time.
Best to restart cluster in this case (after figuring why others took a
while to check in... look at their logs around startup time to see why they

> 3.  Is it possible to make each table with some requests quotas,  it means
> when one table is requested heavily, it has no impact to other tables on
> cluster.
Not sure what the state of this is in 0.98. Maybe someone closer to 0.98


> Thanks

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message