hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Daniel Cryans <jdcry...@apache.org>
Subject Re: Reducing impact of compactions on read performance
Date Tue, 18 May 2010 17:15:14 GMT
>
> 1. Do more frequent, smaller minor compactions.  I guess we would accomplish
> this by lowering hbase.hstore.compactionThreshold,
> hbase.hstore.blockingStoreFiles, and/or hbase.hstore.compaction.max?

Without any log files to analyze, it's hard to tell exactly what kind
of compaction(minor/major) and/or split is happening. Minor
compactions don't rewrite all store files and don't try to merge big
files. Do you monitor your cluster? Do you see a lot of IO wait when
reads are slowing down?

>
> 2. Try to prevent compactions altogether and just cron one major compaction
> per day when the system load is at its lowest.  Not sure that this is a good
> idea.  Does anyone currently do this?

Cron major compactions, although I still can't tell if it's what you're hitting.

>
> 3. I noticed that we're sometimes getting messages like "Too many hlogs:
> logs=33, maxlogs=32; forcing flush of 24 regions(s)".  Should we disable the
> write-ahead log when doing bulk updates?  I'm not entirely clear on the
> relationship between log flushing/rolling and minor/major compactions.  As I
> understand it, a log flush will create HFiles, which might then trigger a
> minor compaction.  Is that correct?  Would disabling WAL help?

HBase limits the rate of inserts to not be overrun by WALs so that if
a machine fails, you don't have to split GBs of files. What about
inserting more slowly into your cluster? Flushes/compactions will be
more spread over time?

Disabling the WAL during your insert will make it a lot faster, not
necessarily what you want here.

>
> 4. Hardware upgrade.  We're running one 7200RPM SATA disk per
> datanode/regionserver now, so our I/O throughput probably isn't great.  We
> will soon be testing a new hardware configuration with 2 SSDs per node.  I'm
> sure this will help, but I'm looking for some short-term solutions that will
> work until we migrate to the new hardware.

Like Ryan said, just shove as much 7.2k RPM disks as you can in each
machine. Google has 12 per borg (number from their Petasort
benchmark).

>
> Have there been any performance improvements since 0.20.3 (other than
> HBASE-2180 which we already have) that might help?  What is the best upgrade
> path if we were to upgrade our production HBase cluster is the next 1-2
> weeks?  0.20.5?  Build a snapshot from trunk/0.21?  CDH3?

HBASE-2248 will help you a lot, deploy 0.20.5 on a dev env when it's
ready then when you are confident restart your HBase prod on the new
jars.

>
> Thanks,
> James
>

Mime
View raw message