hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Abhijit Pol <a...@rocketfuel.com>
Subject Re: HBase cluster with heterogeneous resources
Date Sat, 16 Oct 2010 17:58:33 GMT
Thanks Tatsuya. Will give "vm.swappiness" a shot.


On Fri, Oct 15, 2010 at 4:42 PM, Tatsuya Kawano <tatsuya6502@gmail.com>wrote:

>
> Hi Abhi,
>
> > Can you give more inputs on what might be the drawbacks or risks of
> > permanent swap off or what was the observed horror?
>
>
> Turning off the swap means you'll meet Linux OOM Killer more often. OOM
> Killer (Out Of Memory Killer) tends to kill  processes that use larger
> memory space, so RS can be targeted. Even worse, OOM Killer could get stuck
> because of the low memory situation. It will use up CPU time (%system) and
> you won't be able to ssh into the machine for a while.
>
> Instead of turning off the swap, I would suggest to lower a kernel
> parameter called "vm.swappiness". It takes a number between 0 to 100; higher
> value makes the kernel to swap more often so that it can allocate more RAM
> for the file cache, and lower value makes it less often to swap. So you want
> a lower value.
>
> It's default to 60 on many Linux distributions. Try to make it 0.
>
> Thanks,
> Tatsuya
>
> --
> Tatsuya Kawano
> Tokyo, Japan
>
> http://twitter.com/tasuys6502
>
>
>
>
> On 10/16/2010, at 3:12 AM, Abhijit Pol wrote:
>
> >>
> >>> we did swapoff -a and then updated fstab to permanently turn it off.
> >>
> >> You might not want to turn it off completely.  One of the lads was
> >> recently talking about the horrors that can happen when no swap.
> >>
> >> But sounds like you were doing over eager swapping up to this?
> >>
> >>
> > http://wiki.apache.org/hadoop/PerformanceTuning recommends removing swap
> and
> > we had swap off on part of the cluster and those machines were doing well
> in
> > terms of RS crash and other machines were doing lots of swap. So we
> decided
> > to turn it off for all RS machines.
> >
> > Can you give more inputs on what might be the drawbacks or risks of
> > permanent swap off or what was the observed horror?
> >
> >
> >
> >>> we observed swap was actually happening on RSs and after we turned it
> off
> >> we
> >>> have much stable RSs.
> >>>
> >>> i can tell what we have, not sure that is optimal, in fact looking for
> >>> comments/suggestions from folks who have used it more:
> >>> 64GB RAM ==> 85% given to HBASE HEAP (30% memstore, 60%block cache) ,
> >> 512MB
> >>> DN and 512MB TT
> >>>
> >>
> >> So, I'm bad at math, but thats a heap of 50+GB?  Hows that working out
> >> for you?  You played with GC tuning at all?  You might give more to
> >> the DN and the TT since you have plenty -- and more to the OS...
> >> perhaps less to hbase?
> >>
> >> How many disks?
> >>
> >
> > We played with GC. What worked well so far is starting CMS little early
> at
> > 40% occupancy; we removed 6m newgen restrictions and observed that we are
> > not growing beyond 18mb and minor GC is coming every seconds instead of
> > every 200ms in steady state (we might cap maxnewgen if things go bad),
> but
> > so far all pauses are small less than second and No full GC kicked in.
> >
> > We have given more to HBase (and specifically to block cache) because we
> > want 95% read latencies below 20ms and our load is random read heavy with
> > light read-modify-writes.
> > The rational was to go for small hbase blocks (8KB); larger than HBase
> but
> > smaller than default HDFS block size (64KB); and large block cache to
> > improve hit rate (~37GB)
> > We did very limited experiments with different blocks sizes before going
> > with this configurations.
> >
> > We have 1Gb for DN. We don't run map-reduce much on this cluster so given
> > 512MB to TT. We have separate Hadoop cluster for all our MR
> > and analytics needs.
> >
> > We have 6x1TB disks per machine.
> >
> >
> >>> we have 64KB HDFS block size
> >>
> >> Do you mean 64MB?
> >>
> >>
> >>
> > Its 64KB. Our keys are random enough to have very low chance
> > of exploiting block locality. So for every miss in block cache will read
> one
> > or more random HDFS blocks anyways and hence it make sense to go for
> lower
> > HDFS block size. After getting HBASE-3006 in things improved a lot for
> us.
> >
> > We use large 128MB blocks for our analytic hadoop cluster as it has more
> > seq. reads. Do you think smaller size like 64KB might be actually
> hearting
> > us?
> >
> >
> >
> >> You've done the other stuff -- ulimits and xceivers?
> >>
> >
> > We have 64k ulimit for all our hadoop cluster machines and xceivers is
> set
> > to 2048 for hbase cluster
> >
> >
> >>
> >> Hows it running for you?
> >>
> >
> > I will post some real numbers next week when we have it running for 7
> days
> > with current config.
> >
> > I won't say we have nailed down everything, but better than what we
> started
> > with.
> >
> > Any inputs will be really helpful or anything you think we are doing
> stupid
> > or totally missing it :-)
>
> --
> Tatsuya Kawano (Mr.)
> Tokyo, Japan
>
> http://twitter.com/tatsuya6502
>
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message