hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From stack <st...@duboce.net>
Subject Re: HBASE-138: Under load, regions become extremely large and eventually cause region servers to become unresponsive
Date Mon, 11 Feb 2008 19:27:15 GMT
Marc Harris wrote:
> Logs sent via yousendit.com.
Thanks for the logs.  I took a quick look.  Upload seems to be going a 
long fine until we start getting the WrongRegionException.  In issue 
HBASE-428, you say your client is single-threaded.   Is it think-headed 
too (smile) in that it unrelentingly keeps trying the same row over and 
over?  (The log seems to have prob. w/ same row over and over again).

Guessing as to what is up, either the client cache of regions is messed 
up or the .META. table has become corrupt somehow -- it doesn't have 
list of all regions (Perhaps it didn't get a split update or some such).

If the former, I wonder what would happen if you took your load off, 
killed the client, then resumed at the problematic row?  If things 
started to work again, would seem to point at client-side issue.

> Maybe "re-architect" was not an accurate representation of what I am
> doing. We currently do not have a solution that allows us to add rows to
> our system in arbitrary order and then analyze them, either in order or
> using map-reduce. A year or so ago we tried an RDBMS, and based on that
> experience, and some comments from Doug Cutting,decided that an RDBMS
> had no change of being able to support this kind of functionality.
> In terms of performance parameters, the 200 rows/sec that was achieved
> for the first 500K rows was quite sufficient. I don't have a good answer
> because after all these rows get loaded there will be numerous
> map/reduce jobs that execute on them. I would guess that some vague
> parameters are:
> - In 3 days, load 100Gb of data representing 10M "units" split over 3
> tables each of which is split over 3 column families. Some fraction of
> these "units" will be replacements for existing ones (same key) some
> will be new
> - Several map-reduce jobs that mostly involve reading the data for each
> "unit" and then writing a few small pieces of data (a few bytes) for
> each "unit". Probably some more interesting maps too, but I don't know
> yet.
> - At least 2 map-reduce jobs that delete units.

These numbers look reasonable to me.  Lets try and make it work.
> Am I correct when I say that using 4 region servers will just delay the
> problem by a factor of 4, or have I misunderstood the underlying cause?

The factor might be > 4 but effectively, if an issue using single 
server, then same issue will arise with N nodes.


View raw message