hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Daniel Cryans <jdcry...@apache.org>
Subject Re: Too many hlogs
Date Tue, 03 Apr 2012 17:31:30 GMT
On Mon, Apr 2, 2012 at 2:18 PM, Miles Spielberg <miles@box.com> wrote:
> So it sounds like with our write pattern (highly distributed, all regions
> being written to simultaneously), we should be trying to keep number of
> regions down to 32 (or whatever hbase.regionserver.maxlogs is set to). I
> suppose we could increase the block size, but this would lead to the same
> issue of slow replays as would increasing maxlogs.
>

Or 16 with MEMSTORE_FLUSHSIZE at 128MB.

> Our RegionServers are running with 16 GB heap on 24 GB machines. It sounds
> like we can't meaningfully use this heap with our workload since we want to
> keep MemStores down to ~2 GB to match HLog capacity. (Our read traffic is
> also primarily against recent data, so I guess we won't get much mileage
> out of the block cache for StoreFiles, either.)

One question you should try to answer is if 2GB is good for you or
you're able to tolerate more. With 0.92.0 and distributed log
splitting it's not as big of an issue.

>
> If more regions is just going to get us in this situation again, should we
> also be disabling automatic region splitting? We have a 16 node cluster, so
> a 256 region pre-split would give us 16 regions/server. As these regions
> grow, what is our growth path if more regions/server will lead to us
> hitting maxlogs? Do we have options other than adding additional nodes?

It's usually recommended to stop splitting once you have a good
distribution. If you do split and add regions you'll break the balance
yes, maybe you'll want to let HLogs grow more than the size of your
MemStores to buffer that up. In any case, HLogs that contain edits
from flushed regions will be cleaned up without any other impact so
even if you let them grow to 16GB you might never hit it. Keep in mind
that a force flushing that happens every now and then is much better
and probably indiscernable compared to your current situation where
things go crazy all the time :)

When you add region servers you can do rolling splits, that's what
Facebook does for Messages IIRC.

>
> In a nutshell, it sounds like our choices are:
>
> 1) increase HLog capacity (either via HBase BLOCKSIZE or increasing
> hbase.regionserver.maxlogs), and pay the price of increased downtime when a
> regionserver needs to be restarted.

It's not the HBase block size, it's the hadoop block size that you set
in hbase-site.xml. Also verify exactly how long it takes to replay 2GB
on your system.

> 2) restrict our nodes to <30 regions per node, and add nodes/split when
> compactions start taking too long.

Restricting the number of regions is always good.

J-D

Mime
View raw message