hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tatsuya Kawano <tatsuya6...@gmail.com>
Subject Re: HBase cluster with heterogeneous resources
Date Fri, 15 Oct 2010 23:42:20 GMT

Hi Abhi, 

> Can you give more inputs on what might be the drawbacks or risks of
> permanent swap off or what was the observed horror?


Turning off the swap means you'll meet Linux OOM Killer more often. OOM Killer (Out Of Memory
Killer) tends to kill  processes that use larger memory space, so RS can be targeted. Even
worse, OOM Killer could get stuck because of the low memory situation. It will use up CPU
time (%system) and you won't be able to ssh into the machine for a while. 

Instead of turning off the swap, I would suggest to lower a kernel parameter called "vm.swappiness".
It takes a number between 0 to 100; higher value makes the kernel to swap more often so that
it can allocate more RAM for the file cache, and lower value makes it less often to swap.
So you want a lower value. 

It's default to 60 on many Linux distributions. Try to make it 0.

Thanks, 
Tatsuya

--
Tatsuya Kawano
Tokyo, Japan

http://twitter.com/tasuys6502




On 10/16/2010, at 3:12 AM, Abhijit Pol wrote:

>> 
>>> we did swapoff -a and then updated fstab to permanently turn it off.
>> 
>> You might not want to turn it off completely.  One of the lads was
>> recently talking about the horrors that can happen when no swap.
>> 
>> But sounds like you were doing over eager swapping up to this?
>> 
>> 
> http://wiki.apache.org/hadoop/PerformanceTuning recommends removing swap and
> we had swap off on part of the cluster and those machines were doing well in
> terms of RS crash and other machines were doing lots of swap. So we decided
> to turn it off for all RS machines.
> 
> Can you give more inputs on what might be the drawbacks or risks of
> permanent swap off or what was the observed horror?
> 
> 
> 
>>> we observed swap was actually happening on RSs and after we turned it off
>> we
>>> have much stable RSs.
>>> 
>>> i can tell what we have, not sure that is optimal, in fact looking for
>>> comments/suggestions from folks who have used it more:
>>> 64GB RAM ==> 85% given to HBASE HEAP (30% memstore, 60%block cache) ,
>> 512MB
>>> DN and 512MB TT
>>> 
>> 
>> So, I'm bad at math, but thats a heap of 50+GB?  Hows that working out
>> for you?  You played with GC tuning at all?  You might give more to
>> the DN and the TT since you have plenty -- and more to the OS...
>> perhaps less to hbase?
>> 
>> How many disks?
>> 
> 
> We played with GC. What worked well so far is starting CMS little early at
> 40% occupancy; we removed 6m newgen restrictions and observed that we are
> not growing beyond 18mb and minor GC is coming every seconds instead of
> every 200ms in steady state (we might cap maxnewgen if things go bad), but
> so far all pauses are small less than second and No full GC kicked in.
> 
> We have given more to HBase (and specifically to block cache) because we
> want 95% read latencies below 20ms and our load is random read heavy with
> light read-modify-writes.
> The rational was to go for small hbase blocks (8KB); larger than HBase but
> smaller than default HDFS block size (64KB); and large block cache to
> improve hit rate (~37GB)
> We did very limited experiments with different blocks sizes before going
> with this configurations.
> 
> We have 1Gb for DN. We don't run map-reduce much on this cluster so given
> 512MB to TT. We have separate Hadoop cluster for all our MR
> and analytics needs.
> 
> We have 6x1TB disks per machine.
> 
> 
>>> we have 64KB HDFS block size
>> 
>> Do you mean 64MB?
>> 
>> 
>> 
> Its 64KB. Our keys are random enough to have very low chance
> of exploiting block locality. So for every miss in block cache will read one
> or more random HDFS blocks anyways and hence it make sense to go for lower
> HDFS block size. After getting HBASE-3006 in things improved a lot for us.
> 
> We use large 128MB blocks for our analytic hadoop cluster as it has more
> seq. reads. Do you think smaller size like 64KB might be actually hearting
> us?
> 
> 
> 
>> You've done the other stuff -- ulimits and xceivers?
>> 
> 
> We have 64k ulimit for all our hadoop cluster machines and xceivers is set
> to 2048 for hbase cluster
> 
> 
>> 
>> Hows it running for you?
>> 
> 
> I will post some real numbers next week when we have it running for 7 days
> with current config.
> 
> I won't say we have nailed down everything, but better than what we started
> with.
> 
> Any inputs will be really helpful or anything you think we are doing stupid
> or totally missing it :-)

--
Tatsuya Kawano (Mr.)
Tokyo, Japan

http://twitter.com/tatsuya6502




Mime
View raw message