hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Neil Yalowitz <neilyalow...@gmail.com>
Subject heavy writing and compaction storms
Date Thu, 12 Jan 2012 17:56:56 GMT
Hi all,

What strategies do HBase 0.90 users employ to deal with or avoid the
so-called "compaction storm"?  I'm referring to the issue referred to in here:


The MR job I'm working with executes many PUTs during the Map phase with
HTable.put() in batches of 1,000.  The keys are well distributed which,
while ideal for evenly distributed PUT performance, creates a level
increase of Storefiles on all regions.  When a compaction threshold is
reached for one region, it is usually reached for many regions... causing
many, many regions to request compaction.  Seems like a classic "compaction
storm" problem.  With a thousand regions all requesting compaction, the
compactionQueueSize will quickly climb for a server.

Some options we have discussed for this problem:

1) an HBase cooldown - slowing down the writes by feeding the input files
at a slower interval

I'm not certain this will fix the problem.  It still seems likely that
evenly distributed writes will eventually trigger many regions to request

2) an HBase cooldown with a major_compact - disabling all automatic
compaction by setting the compaction thresholds at a very high number and
then running a major_compact on the two tables our MR job writes to

I'm using the following settings to completely disable all compaction:

hbase.regionserver.thread.splitcompactcheckfrequency = Integer.MAX_VALUE
(is this setting deprecated in 0.90?  what about 0.92?)
hbase.hstore.compactionThreshold = Integer.MAX_VALUE
hbase.hstore.blockingStoreFiles  = Integer.MAX_VALUE
hbase.hstore.compaction.max = Integer.MAX_VALUE
hbase.hstore.blockingWaitTime = 0

This looks ugly, but it seems to be the only way to ensure that compaction
will not occur (unless I'm missing something).  Obviously, a system that is
not periodically manually compacted will eventually go down in flames with
these settings.

3) manually compact only certain regions - disabling all automatic
compaction as mentioned in #2 and have a separate job that polls the
regions and compacts certain regions according to need, but not allowing
all regions to compact automatically

What are other people's experiences with this issue?  Performing all
compaction during a cooldown period (#2)?  Performing compaction in a
rolling fashion (#3)?  Slower writes (#1)?  Something completely different?


Neil Yalowitz

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message