hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Stack <st...@powerset.com>
Subject too busy host causes NotServingRegion exception?
Date Fri, 18 Apr 2008 15:39:21 GMT
....

> 08/04/18 01:51:15 starting compaction
> 08/04/18 01:51:22 region closed

I'd guess a split has just happened and that it was responsible for the
close of the region.

> 08/04/18 01:51:41 NotServingRegion Exception
> 08/04/18 01:51:47 compaction done
> 08/04/18 01:51:51 NotServingRegion Exception
> 08/04/18 01:52:01 NotServingRegion Exception
> 08/04/18 01:52:11 NotServingRegion Exception
> 08/04/18 01:52:21 NotServingRegion Exception

These 'exceptions' happen while there is no region available with the
requested row.  I'd guess during this time, the master is being told of the
split, it then tells other regionservers to open the new split daughters.


> 08/04/18 01:52:47 open the region in question
> 08/04/18 01:52:47 region avilable

Its a little distressing that it took a minute for the region to come back
on line.


> the master log somehow got truncated, IIRC, the master tried to assign the
> region to this region server some where between 01:51:22 and 01:51:41.

Out of interest, where are these log messages coming out of?  Out of a .out
file or out of a .log file?

>>From my understanding, this region server is a little busy so it does not
> accept the assignment from the master. I'm wondering if this is caused by
> too busy regionsserver (the request per sec on each region server is about
> 1000), and if so, what configuration variables should I tune with?

If you are doing a bunch of splitting, there may be a queue of regions to
open at the regionserver.  Currently they are processed serially.  Can take
some time.  Do you have DEBUG enabled so you can see more of whats going on
(There may be an issue in TRUNK setting this).

> In addition, what would be the best practices when writing client by
> java to deal with such exception (as NotServingRegion should be common
> on a very busy HBase instance, I think).

Does this come out at your client?  If so, and its looking like the
wanted-region eventually comes on-line, try upping
hbase.client.retries.number.

> BTW, I was getting lots of different strange failures when doing the same
> thing on hadoop-0.16.X and hbase-0.1.X. After switching to hbase trunk,
> I only get the error above. It seems there are no more mysterious exceptions
> :-D

Can we see them please?  We're operating under the perhaps false notion that
our releases are the most stable hbase.

Thanks,
St.Ack


Mime
View raw message