hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Bigdatafun <sean.bigdata...@gmail.com>
Subject Re: HBase cluster with heterogeneous resources
Date Fri, 15 Oct 2010 21:14:21 GMT
On Fri, Oct 15, 2010 at 11:12 AM, Abhijit Pol <apol@rocketfuel.com> wrote:

> >
> > > we did swapoff -a and then updated fstab to permanently turn it off.
> >
> > You might not want to turn it off completely.  One of the lads was
> > recently talking about the horrors that can happen when no swap.
> >
> > But sounds like you were doing over eager swapping up to this?
> >
> >
> http://wiki.apache.org/hadoop/PerformanceTuning recommends removing swap
> and
> we had swap off on part of the cluster and those machines were doing well
> in
> terms of RS crash and other machines were doing lots of swap. So we decided
> to turn it off for all RS machines.
>
> Can you give more inputs on what might be the drawbacks or risks of
> permanent swap off or what was the observed horror?
>
>
>
> > > we observed swap was actually happening on RSs and after we turned it
> off
> > we
> > > have much stable RSs.
> > >
> > > i can tell what we have, not sure that is optimal, in fact looking for
> > > comments/suggestions from folks who have used it more:
> > > 64GB RAM ==> 85% given to HBASE HEAP (30% memstore, 60%block cache) ,
> > 512MB
> > > DN and 512MB TT
> > >
> >
> > So, I'm bad at math, but thats a heap of 50+GB?  Hows that working out
> > for you?  You played with GC tuning at all?  You might give more to
> > the DN and the TT since you have plenty -- and more to the OS...
> > perhaps less to hbase?
> >
> > How many disks?
> >
>
> We played with GC. What worked well so far is starting CMS little early at
> 40% occupancy; we removed 6m newgen restrictions and observed that we are
> not growing beyond 18mb and minor GC is coming every seconds instead of
> every 200ms in steady state (we might cap maxnewgen if things go bad), but
> so far all pauses are small less than second and No full GC kicked in.
>
> We have given more to HBase (and specifically to block cache) because we
> want 95% read latencies below 20ms and our load is random read heavy with
> light read-modify-writes.
> The rational was to go for small hbase blocks (8KB); larger than HBase but
> smaller than default HDFS block size (64KB); and large block cache to
> improve hit rate (~37GB)
> We did very limited experiments with different blocks sizes before going
> with this configurations.
>
> We have 1Gb for DN. We don't run map-reduce much on this cluster so given
> 512MB to TT. We have separate Hadoop cluster for all our MR
> and analytics needs.
>
> We have 6x1TB disks per machine.
>
>
> > > we have 64KB HDFS block size
> >
> > Do you mean 64MB?
> >
> >
> >
> Its 64KB. Our keys are random enough to have very low chance
> of exploiting block locality. So for every miss in block cache will read
> one
> or more random HDFS blocks anyways and hence it make sense to go for lower
> HDFS block size. After getting HBASE-3006 in things improved a lot for us.
>

If this is your setup, your HDFS' namenode is bound to OOM soon. (Namenode's
memory consumption is proportional to the number of blocks on HDFS)

I guess you meant "hfile.min.blocksize.size" in ? That is a different
parameter from HDFS' block size, IMO. (need someone to confirm)


>
>  We use large 128MB blocks for our analytic hadoop cluster as it has more
> seq. reads. Do you think smaller size like 64KB might be actually hearting
> us?
>
>
>
> > You've done the other stuff -- ulimits and xceivers?
> >
>
> We have 64k ulimit for all our hadoop cluster machines and xceivers is set
> to 2048 for hbase cluster
>
>
> >
> > Hows it running for you?
> >
>
> I will post some real numbers next week when we have it running for 7 days
> with current config.
>
> I won't say we have nailed down everything, but better than what we started
> with.
>
> Any inputs will be really helpful or anything you think we are doing stupid
> or totally missing it :-)
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message