hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "M.Deniz OKTAR" <deniz.ok...@gmail.com>
Subject Re: region servers dying - flush request - YCSB
Date Mon, 07 Mar 2011 16:48:20 GMT
I don't know if its normal but I see alot of '0's in the test results when
it tends to fail, such as:

 1196 sec: 7394901 operations; 0 current ops/sec;

--
deniz

On Mon, Mar 7, 2011 at 6:46 PM, M.Deniz OKTAR <deniz.oktar@gmail.com> wrote:

> Hi,
>
> Thanks for the effort, answers below:
>
>
>
>
> On Mon, Mar 7, 2011 at 6:08 PM, Stack <stack@duboce.net> wrote:
>
>> On Mon, Mar 7, 2011 at 5:43 AM, M.Deniz OKTAR <deniz.oktar@gmail.com>
>> wrote:
>> > We have a 5 node cluster, 4 of them being region servers. I am running a
>> > custom workload with YCSB and when the data is loading (heavy insert) at
>> > least one of the region servers are dying after about 600000 operations.
>>
>>
>> Tell us the character of your 'custom workload' please.
>>
>>
> The workload is below, the part that fails is the loading part (-load)
> which inserts all the records first)
>
> recordcount=10000000
> operationcount=3000000
> workload=com.yahoo.ycsb.workloads.CoreWorkload
>
> readallfields=true
>
> readproportion=0.5
> updateproportion=0.1
> scanproportion=0
> insertproportion=0.35
> readmodifywriteproportion=0.05
>
> requestdistribution=zipfian
>
>
>
>
>>
>> > There are no abnormalities in the logs as far as I can see, the only
>> common
>> > point is that all of them(in different trials, different region servers
>> > fail) request for a flush as the last logs, given below. .out files are
>> > empty. I am looking at the /var/log/hbase folder for logs. Running sun
>> java
>> > 6 latest version. I couldn't find any logs that indicates a problem with
>> > java. Tried the tests with openjdk and had the same results.
>> >
>>
>> Its strange that flush is the last thing in your log.  The process is
>> dead?  We are exiting w/o a note in logs?  Thats unusual.  We usually
>> scream loudly when dying.
>>
>
> Yes, thats the strange part. The last line is a flush as if the process
> never failed. Yes, the process is dead and hbase cannot see the node.
>
>
>>
>> > I have set ulimits(50000) and xceivers(20000) for multiple users and
>> certain
>> > that they are correct.
>>
>> The first line in an hbase log prints out the ulimit it sees.  You
>> might check that the hbase process for sure is picking up your ulimit
>> setting.
>>
>> That was a mistake I did a couple of days ago, checked it with cat
> /proc/<pid of reginserver>/limits  and all related users like 'hbase' has
> those limits. Checked the logs:
>
> Mon Mar  7 06:41:15 EET 2011 Starting regionserver on test-1
> ulimit -n 52768
>
>>
>> > Also in the kernel logs, there are no apparent problems.
>> >
>>
>> (The mystery compounds)
>>
>> > 2011-03-07 15:07:58,301 DEBUG
>> > org.apache.hadoop.hbase.regionserver.CompactSplitThread: Compaction
>> > requested for
>> > usertable,user1030079237,1299502934627.257739740f58da96d5c5ef51a7d3efc3.
>> > because regionserver60020.cacheFlusher; priority=3, compaction queue
>> size=18
>> > 2011-03-07 15:07:58,301 DEBUG
>> org.apache.hadoop.hbase.regionserver.HRegion:
>> > NOT flushing memstore for region
>> >
>> usertable,user1601881548,1299502135191.f8efb9aa0922fa8a6a53fc49b8155ebc.,
>> > flushing=false, writesEnabled=false
>> > 2011-03-07 15:07:58,301 DEBUG
>> org.apache.hadoop.hbase.regionserver.HRegion:
>> > Started memstore flush for
>> >
>> usertable,user1662209069,1299502135191.9fa929e6fb439843cffb604dea3f88f6.,
>> > current region memstore size 68.6m
>> > 2011-03-07 15:07:58,310 DEBUG
>> org.apache.hadoop.hbase.regionserver.HRegion:
>> > Flush requested on
>> > usertable,user1601881548,1299502135191.f8efb9aa0922fa8a6a53fc49b8155ebc.
>> > -end of log file-
>> > ---
>> >
>>
>> Nothing more?
>>
>>
> No, nothing after that. But quite a lot of logs before that, I can send
> them if you'd like.
>
>
>
>> Thanks,
>> St.Ack
>>
>
> Thanks alot!
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message