hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stack <st...@duboce.net>
Subject Re: region servers dying - flush request - YCSB
Date Mon, 07 Mar 2011 17:19:02 GMT
I'm stumped.  I have nothing to go on when no death throes or
complaints.  This hardware for sure is healthy?  Other stuff runs w/o
issue?
St.Ack

On Mon, Mar 7, 2011 at 8:48 AM, M.Deniz OKTAR <deniz.oktar@gmail.com> wrote:
> I don't know if its normal but I see alot of '0's in the test results when
> it tends to fail, such as:
>
>  1196 sec: 7394901 operations; 0 current ops/sec;
>
> --
> deniz
>
> On Mon, Mar 7, 2011 at 6:46 PM, M.Deniz OKTAR <deniz.oktar@gmail.com> wrote:
>
>> Hi,
>>
>> Thanks for the effort, answers below:
>>
>>
>>
>>
>> On Mon, Mar 7, 2011 at 6:08 PM, Stack <stack@duboce.net> wrote:
>>
>>> On Mon, Mar 7, 2011 at 5:43 AM, M.Deniz OKTAR <deniz.oktar@gmail.com>
>>> wrote:
>>> > We have a 5 node cluster, 4 of them being region servers. I am running a
>>> > custom workload with YCSB and when the data is loading (heavy insert) at
>>> > least one of the region servers are dying after about 600000 operations.
>>>
>>>
>>> Tell us the character of your 'custom workload' please.
>>>
>>>
>> The workload is below, the part that fails is the loading part (-load)
>> which inserts all the records first)
>>
>> recordcount=10000000
>> operationcount=3000000
>> workload=com.yahoo.ycsb.workloads.CoreWorkload
>>
>> readallfields=true
>>
>> readproportion=0.5
>> updateproportion=0.1
>> scanproportion=0
>> insertproportion=0.35
>> readmodifywriteproportion=0.05
>>
>> requestdistribution=zipfian
>>
>>
>>
>>
>>>
>>> > There are no abnormalities in the logs as far as I can see, the only
>>> common
>>> > point is that all of them(in different trials, different region servers
>>> > fail) request for a flush as the last logs, given below. .out files are
>>> > empty. I am looking at the /var/log/hbase folder for logs. Running sun
>>> java
>>> > 6 latest version. I couldn't find any logs that indicates a problem with
>>> > java. Tried the tests with openjdk and had the same results.
>>> >
>>>
>>> Its strange that flush is the last thing in your log.  The process is
>>> dead?  We are exiting w/o a note in logs?  Thats unusual.  We usually
>>> scream loudly when dying.
>>>
>>
>> Yes, thats the strange part. The last line is a flush as if the process
>> never failed. Yes, the process is dead and hbase cannot see the node.
>>
>>
>>>
>>> > I have set ulimits(50000) and xceivers(20000) for multiple users and
>>> certain
>>> > that they are correct.
>>>
>>> The first line in an hbase log prints out the ulimit it sees.  You
>>> might check that the hbase process for sure is picking up your ulimit
>>> setting.
>>>
>>> That was a mistake I did a couple of days ago, checked it with cat
>> /proc/<pid of reginserver>/limits  and all related users like 'hbase' has
>> those limits. Checked the logs:
>>
>> Mon Mar  7 06:41:15 EET 2011 Starting regionserver on test-1
>> ulimit -n 52768
>>
>>>
>>> > Also in the kernel logs, there are no apparent problems.
>>> >
>>>
>>> (The mystery compounds)
>>>
>>> > 2011-03-07 15:07:58,301 DEBUG
>>> > org.apache.hadoop.hbase.regionserver.CompactSplitThread: Compaction
>>> > requested for
>>> > usertable,user1030079237,1299502934627.257739740f58da96d5c5ef51a7d3efc3.
>>> > because regionserver60020.cacheFlusher; priority=3, compaction queue
>>> size=18
>>> > 2011-03-07 15:07:58,301 DEBUG
>>> org.apache.hadoop.hbase.regionserver.HRegion:
>>> > NOT flushing memstore for region
>>> >
>>> usertable,user1601881548,1299502135191.f8efb9aa0922fa8a6a53fc49b8155ebc.,
>>> > flushing=false, writesEnabled=false
>>> > 2011-03-07 15:07:58,301 DEBUG
>>> org.apache.hadoop.hbase.regionserver.HRegion:
>>> > Started memstore flush for
>>> >
>>> usertable,user1662209069,1299502135191.9fa929e6fb439843cffb604dea3f88f6.,
>>> > current region memstore size 68.6m
>>> > 2011-03-07 15:07:58,310 DEBUG
>>> org.apache.hadoop.hbase.regionserver.HRegion:
>>> > Flush requested on
>>> > usertable,user1601881548,1299502135191.f8efb9aa0922fa8a6a53fc49b8155ebc.
>>> > -end of log file-
>>> > ---
>>> >
>>>
>>> Nothing more?
>>>
>>>
>> No, nothing after that. But quite a lot of logs before that, I can send
>> them if you'd like.
>>
>>
>>
>>> Thanks,
>>> St.Ack
>>>
>>
>> Thanks alot!
>>
>>
>

Mime
View raw message