hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bryan Duxbury <br...@rapleaf.com>
Subject Re: too busy host causes NotServingRegion exception?
Date Fri, 18 Apr 2008 16:14:04 GMT
NotServingRegionExceptions are normal when they appear in the  
regionserver logs. They're not normal when they come out of your  
client code. You get an NSRE when a region gets split or reassigned  
and the client's cache of the region's location is out of date.  
Normally, the HTable client retries a bunch, and eventually it gets  
sorted out. However, if the reassignment/splitting/etc takes longer  
than all the retries, the client will get the NSRE. In general we'd  
like for those not to happen, but I'm not sure that there's actually  
something wrong.

When you say once in a while, how frequent are you talking about?

If you want to tune this problem away, you can edit your hbase- 
site.xml and change hbase.client.retries to be a bigger number and/or  
hbase.client.pause to be longer. That might resolve your issue. If  
something is actually broken in HBase, more retries won't help, and  
that would be an interesting fact to know. If it is just a timing/ 
load issue, then more retries or a longer pause will probably fix it.  
This would also be a really interesting fact to know :).

Glad to hear that trunk erases some of the mystery of 0.16!

-Bryan

On Apr 18, 2008, at 3:29 AM, Rong-en Fan wrote:

> I'm running hbase and hadoop-0.17 trunk code as of earlier today  
> (without
> HBASE-10). While loading 50m records into a table with ~800,000  
> rows with only
> one column family. This is a 3 node DFS and 3 region servers. I load
> the data from one of these three boxes. Once awhilte, I got  
> NotServingRegion
> exception, the code looks like
>
> BatchUpdate bu = new BatchUpdate(row)
> bu.put(...)
> table.commit(bu)
>
> When I examine region server's log, it shows something like:
>
> 08/04/18 01:51:14 open the region in question
> 08/04/18 01:51:15 region available
> 08/04/18 01:51:15 starting compaction
> 08/04/18 01:51:22 region closed
> 08/04/18 01:51:41 NotServingRegion Exception
> 08/04/18 01:51:47 compaction done
> 08/04/18 01:51:51 NotServingRegion Exception
> 08/04/18 01:52:01 NotServingRegion Exception
> 08/04/18 01:52:11 NotServingRegion Exception
> 08/04/18 01:52:21 NotServingRegion Exception
> 08/04/18 01:52:47 open the region in question
> 08/04/18 01:52:47 region avilable
>
> the master log somehow got truncated, IIRC, the master tried to  
> assign the
> region to this region server some where between 01:51:22 and 01:51:41.
>
> From my understanding, this region server is a little busy so it  
> does not
> accept the assignment from the master. I'm wondering if this is  
> caused by
> too busy regionsserver (the request per sec on each region server  
> is about
> 1000), and if so, what configuration variables should I tune with?
> In addition, what would be the best practices when writing client by
> java to deal with such exception (as NotServingRegion should be common
> on a very busy HBase instance, I think).
>
> BTW, I was getting lots of different strange failures when doing  
> the same
> thing on hadoop-0.16.X and hbase-0.1.X. After switching to hbase  
> trunk,
> I only get the error above. It seems there are no more mysterious  
> exceptions :-D
>
> Thanks,
> Rong-En Fan


Mime
View raw message