hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James Baldassari <jbaldass...@gmail.com>
Subject Re: Reducing impact of compactions on read performance
Date Tue, 18 May 2010 17:57:16 GMT
Resending this to hbase-user@hadoop.apache.org because my mail to
user@hbase.apache.org failed with "550 550 mail to user@hbase.apache.org not
accepted here (state 14)".  Is the reply-to getting set correctly?  Anyway,
responses inline...

On Tue, May 18, 2010 at 1:15 PM, Jean-Daniel Cryans <jdcryans@apache.org>wrote:

> >
> > 1. Do more frequent, smaller minor compactions.  I guess we would
> accomplish
> > this by lowering hbase.hstore.compactionThreshold,
> > hbase.hstore.blockingStoreFiles, and/or hbase.hstore.compaction.max?
> Without any log files to analyze, it's hard to tell exactly what kind
> of compaction(minor/major) and/or split is happening. Minor
> compactions don't rewrite all store files and don't try to merge big
> files. Do you monitor your cluster? Do you see a lot of IO wait when
> reads are slowing down?

Here is a region server log from yesterday: http://pastebin.com/5a04kZVj
Every time one of those compactions ran (around 1pm, 4pm, 6pm, etc.) our
read performance took a big hit.  BTW, is there a way I can tell by looking
at the logs whether a minor or major compaction is running?  Yes, we do see
lots of I/O wait (as high as 30-40% at times) when the compactions are
running and reads are slow.  Load averages during compactions can spike as
high as 60.

> >
> > 2. Try to prevent compactions altogether and just cron one major
> compaction
> > per day when the system load is at its lowest.  Not sure that this is a
> good
> > idea.  Does anyone currently do this?
> Cron major compactions, although I still can't tell if it's what you're
> hitting.

OK, I'll set up a cron to kick majors off when load is at its lowest.  Can't
hurt I suppose.

> >
> > 3. I noticed that we're sometimes getting messages like "Too many hlogs:
> > logs=33, maxlogs=32; forcing flush of 24 regions(s)".  Should we disable
> the
> > write-ahead log when doing bulk updates?  I'm not entirely clear on the
> > relationship between log flushing/rolling and minor/major compactions.
>  As I
> > understand it, a log flush will create HFiles, which might then trigger a
> > minor compaction.  Is that correct?  Would disabling WAL help?
> HBase limits the rate of inserts to not be overrun by WALs so that if
> a machine fails, you don't have to split GBs of files. What about
> inserting more slowly into your cluster? Flushes/compactions will be
> more spread over time?
> Disabling the WAL during your insert will make it a lot faster, not
> necessarily what you want here.

Our inserts are already fairly fast.  I think we usually get around
30,000/sec when we do these bulk imports.  I'm less concerned about insert
speed and more concerned about the impact to reads when we do the bulk
imports and a compaction is triggered.  Do you think it makes sense to
disable WAL for the bulk inserts in this case?  Would disabling WAL decrease
the number of compactions that are required?

> >
> > 4. Hardware upgrade.  We're running one 7200RPM SATA disk per
> > datanode/regionserver now, so our I/O throughput probably isn't great.
>  We
> > will soon be testing a new hardware configuration with 2 SSDs per node.
>  I'm
> > sure this will help, but I'm looking for some short-term solutions that
> will
> > work until we migrate to the new hardware.
> Like Ryan said, just shove as much 7.2k RPM disks as you can in each
> machine. Google has 12 per borg (number from their Petasort
> benchmark).

Yes, thanks to you and Ryan for pointing this out.  I didn't realize how
important it was to have multiple disks in each node.  The performance
issues we've been having are probably due to I/O bottlenecks more than
anything else.  If a hardware upgrade is the final answer, that's fine.  I
was just hoping for something that would help in the short term until we get
hardware with more/faster disks.

> >
> > Have there been any performance improvements since 0.20.3 (other than
> > HBASE-2180 which we already have) that might help?  What is the best
> upgrade
> > path if we were to upgrade our production HBase cluster is the next 1-2
> > weeks?  0.20.5?  Build a snapshot from trunk/0.21?  CDH3?
> HBASE-2248 will help you a lot, deploy 0.20.5 on a dev env when it's
> ready then when you are confident restart your HBase prod on the new
> jars.

OK, I'm eagerly awaiting the next release.  Seems like there have been lots
of good improvements since 0.20.3!

> >
> > Thanks,
> > James
> >

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message