hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From stack <st...@duboce.net>
Subject Re: HBASE-138: Under load, regions become extremely large and eventually cause region servers to become unresponsive
Date Fri, 08 Feb 2008 18:05:28 GMT
Marc Harris wrote:
> I have create a JIRA issue for this, HBASE-428
>
> Yes, things are improved a bit (it takes longer to get to the problem
> state by a factor of about 10 rows), but not much. I have put some of
> the exceptions in the bug. On Sunday I should be able to run the load
> again with debug logging on (if I find out how to). Probably not worth
> sending you my regionserver log until then.
>   

http://wiki.apache.org/hadoop/Hbase/FAQ#4

Yeah, DEBUG will help. It has stuff like how long flushes and 
compactions are taking and the count of Store files that are being 
compacted at any one time.  Will help figure whats going on.

> At the moment the functionality that I am trying to re-architect runs
> happily on 1 server, so it would be a hard sell to say that we need 4
> servers 4 it. Anyway, as I understand the bug, wouldn't that just reduce
> the probability of a problematic region by a factor of 4? So the problem
> will just take 4 times as long to appear which is not much help. It's
> not like the node is a cluster can actually compensate for each other.
> But I don't really understand fully what the issue is.
>
>   
Are you using an RDBMS now in your current soln?  How close to your 
current soln. does HBase have to come Marc?  (And what are you looking 
for?  1M/10M/100M into a single server in N hours?).

Thanks for persevering with the testing.

St.Ack


> - Marc
>
>
> On Thu, 2008-02-07 at 20:38 -0800, stack wrote:
>
>   
>> Marc Harris wrote:
>>     
>>> I have installed 0.16.0 rc 1 which I believe contains a fix for this
>>> issue, but I still see the same problem.
>>>
>>> - I am using a single node.
>>> - The client application runs in a single thread, loading data into a
>>> single table.
>>> - I get good throughput of about 200 rows/sec to start with, with
>>> occasional significant drops due to NotServingRegionException's that are
>>> recoverable on client retry (internal to hbase).
>>> - After 54 minutes, and about 500,000 rows I start to see
>>> WrongRegionException's in the client application, i.e. real failures.
>>>   
>>>       
>> Are things improved at all?  Were you able to do 500k rows with previous 
>> hbase versions?
>>
>> Send us over some of those WREs.  We'd thought we'd fixed those.
>>
>>     
>>> - Throughput rapidly drops to only a few rows per minute plus a few rows
>>> that had errors
>>>
>>> Should I be adding these comments to the JIRA issue? I did not see a way
>>> to reopen the issue; perhaps I just don't have the permission necessary.
>>>
>>>   
>>>       
>> Yeah, make a JIRA.  Describe roughly the data type, sizes, and schema.  
>> Want to send me your regionserver log?  Do you have DEBUG enabled?  
>> That'd help.  (I have still to look at the log you sent me previous -- 
>> I'll get to it).  Is it critical that this work all on one server only?  
>> For example, would it be an option to run 4 servers?
>>
>> Thanks Marc,
>> St.Ack
>>
>>     
>>> Thanks,
>>> - Marc
>>>
>>>
>>>   
>>>       
>
>   


Mime
View raw message