hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Daniel Cryans <jdcry...@apache.org>
Subject Re: heavy writing and compaction storms
Date Thu, 12 Jan 2012 18:12:58 GMT
Hi,

First you should consider using bulk import instead of a massive MR
job. If you decide against that, then

 - make sure you pre-split:
http://hbase.apache.org/book/important_configurations.html#disable.splitting
 - regarding major compactions, usually people switch off the
automatic mode and cron it to run like X times a week during low
traffic (in your case, just don't use them during the import)
 - set the MEMSTORE_FLUSHSIZE to a really high number during the
import so that you flush big files and compact as least as possible.
The default configs work best for a real time load, not an import.

Also I guess you already know that you need a big heap, no swapping, etc.

Regarding number of regions and memstore size, a perfect config would
be where you can load all the memstores completely before flushing
them. hbase.regionserver.global.memstore.upperLimit is the percentage
your memstores can occupy in the heap and
hbase.regionserver.global.memstore.lowerLimit is the point at which it
starts force flushing regions. Take that into account too.

Hope this helps getting you started.

J-D

On Thu, Jan 12, 2012 at 9:56 AM, Neil Yalowitz <neilyalowitz@gmail.com> wrote:
> Hi all,
>
> What strategies do HBase 0.90 users employ to deal with or avoid the
> so-called "compaction storm"?  I'm referring to the issue referred to in
> 2.8.2.7 here:
>
> http://hbase.apache.org/book.html#important_configurations
>
> The MR job I'm working with executes many PUTs during the Map phase with
> HTable.put() in batches of 1,000.  The keys are well distributed which,
> while ideal for evenly distributed PUT performance, creates a level
> increase of Storefiles on all regions.  When a compaction threshold is
> reached for one region, it is usually reached for many regions... causing
> many, many regions to request compaction.  Seems like a classic "compaction
> storm" problem.  With a thousand regions all requesting compaction, the
> compactionQueueSize will quickly climb for a server.
>
> Some options we have discussed for this problem:
>
> 1) an HBase cooldown - slowing down the writes by feeding the input files
> at a slower interval
>
> I'm not certain this will fix the problem.  It still seems likely that
> evenly distributed writes will eventually trigger many regions to request
> compaction.
>
> 2) an HBase cooldown with a major_compact - disabling all automatic
> compaction by setting the compaction thresholds at a very high number and
> then running a major_compact on the two tables our MR job writes to
>
> I'm using the following settings to completely disable all compaction:
>
> hbase.regionserver.thread.splitcompactcheckfrequency = Integer.MAX_VALUE
> (is this setting deprecated in 0.90?  what about 0.92?)
> hbase.hstore.compactionThreshold = Integer.MAX_VALUE
> hbase.hstore.blockingStoreFiles  = Integer.MAX_VALUE
> hbase.hstore.compaction.max = Integer.MAX_VALUE
> hbase.hstore.blockingWaitTime = 0
>
> This looks ugly, but it seems to be the only way to ensure that compaction
> will not occur (unless I'm missing something).  Obviously, a system that is
> not periodically manually compacted will eventually go down in flames with
> these settings.
>
> 3) manually compact only certain regions - disabling all automatic
> compaction as mentioned in #2 and have a separate job that polls the
> regions and compacts certain regions according to need, but not allowing
> all regions to compact automatically
>
>
> What are other people's experiences with this issue?  Performing all
> compaction during a cooldown period (#2)?  Performing compaction in a
> rolling fashion (#3)?  Slower writes (#1)?  Something completely different?
>
>
> Thanks,
>
> Neil Yalowitz

Mime
View raw message