hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nicolas Spiegelberg <nspiegelb...@fb.com>
Subject Re: Major Compaction Concerns
Date Thu, 12 Jan 2012 21:44:21 GMT

>The system is an OLTP system, with strict latency and throughput
>requirements, regions are pre-splitted and throughput is controlled.
>The system has heavy load period for few hours, during heavy load i mean
>high proportion insert/update and small proportion of read.

I'm not sure about the production status of your system, but you sound
like you have critical need for dozens of optimization features coming out
in 0.92 and even some trunk patches.  In particular, update speed has been
drastically improved due to lazy seek.  Although you can get incremental
wins with a different compaction features, you will get exponential wins
from looking at other features right now.

>we fall in the memstore flush throttling (
>will wait 90000 ms before flushing the memstore) retaining more logs,
>triggering more flush that can't be flushed.... adding pressure on the
>system memory (memstore is not flushed on time)

Filling up the logs faster than you can flush normally indicates that you
have disk or network saturation.  If you have an increment workload, I
know there are a number of patches in 0.92 that will drastically reduce
your flush size (1: read memstore before going to disk, 2: don't flush all
versions).  You don't have a compaction problem, you have a write/read

In 0.92, you can try setting your compaction.ratio down (0.25 is a good
start) to increase the StoreFile count to slow reads but save Network IO
on write.  This setting is very similar to the defaults suggested in the
BigTable paper.  However, this is only going to cut your Network IO in
half.  The LevelDB or BigTable algorithm can reduce your outlier StoreFile
count, but they wouldn't be able to cut this IO volume down much either.

>Please remember i'm on 0.90.1 so when major compaction is running minor is
>blocked, when a memstore for a column family is flushed all other memstore
>(for other) column family are also (no matter if they are smaller or not).
>As you already wrote, the best way is to manage compaction, and it is what
>i tried to do.

Per-storefile compactions & multi-threaded compactions were added 0.92 to
address this problem.  However, a high StoreFile count is not necessarily
a bad thing.  For an update workload, you only have to read the newest
StoreFile and lazy seek optimizes your situation a lot (again 0.92).

>Regarding the compaction plug-ability needs.
>Let suppose that the data you are inserting in different column family has
>a different pattern, for example on CF1 (column family #1) you update
>fields in the same row key while in CF2 you add each time new fields or
>CF2 has new row and older rows are never updated won't you use different
>algorithms for compacting these CF?

There are mostly 3 different workloads that require different
optimizations (not necessarily compaction-related):
1. Read old data.  Should properly use bloom filters to filter out
2. R+W.  Will really benefit from lazy seeks & cache on write (0.92).  Far
more than a compaction algorithm
3. Write mostly.  Don't really care about compactions here.  Just don't
want them to be sucking too much IO

>Finally the schema design is guided by the ACID property of a row, we have
>2 CF only both CF holds a different volume of data even if they are
>Updated approximately with the same amount of data (cell updated vs cell

Note that 0.90 only had row-based write atomicity.  HBASE-2856 is
necessary for row-based read atomicity across column families.

View raw message