hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nicolas Spiegelberg <nspiegelb...@fb.com>
Subject Re: Major Compaction Concerns
Date Sat, 14 Jan 2012 02:06:49 GMT
>I'm sorry but i don't understand, of course i have a disk and network
>saturation and the flush stop to flush because he is waiting for
>to finish. Since this a major compaction was triggered - all the
>stores (large number)  present on the disks (7 disk per RS) will be
>for major compact, and the I/O is affected. Network is also affected since
>all are major compacting at the same time and replicating files on same
>time (1GB network).

When you have an IO problem, there are multiple pieces at play that you
can adjust:

Write: HLog, Flush, Compaction
Read: Point Query, Scan

If your writes are far more than your reads, then you should relax one of
the write pieces.  
- HLog: You can't really adjust HLog IO outside of key compression
- Flush: You can adjust your compression.  None->LZO == 5x compression.
LZO->GZ == 2x compression.  Both are at the expense of CPU.  HBASE-4241
minimizes flush IO significantly in the update-heavy use case (discussed
this in the last email).
- Compaction: You can lower the compaction ratio to minimize the amount of
rewrites over time.  That's why I suggested changing the ratio from 1.2 ->
0.25.  This gives a ~50% IO reduction (blog post on this forthcoming @
http://www.facebook.com/UsingHBase ).

However, you may have a lot more reads than you think.  For example, let's
say read:write ratio is 1:10, so significantly read dominated.  Without
any of the optimizations I listed in the previous email, your real read
ratio is multiplied by the StoreFile count (because you naively read all
StoreFiles).  So let say, during congestion, you have 20 StoreFiles.
1*20:10 means that you're now 2:1 read dominated.  You need features to
reduce the number of StoreFiles you scan when the StoreFile count is high.

- Point Query: bloom filters (HBASE-1200, HBASE-2794), lazy seek
(HBASE-4465), and seek optimizations (HBASE-4433, HBASE-4434, HBASE-4469,
- Scan: not as many optimizations here.  Mostly revolve around proper
usage & seek-next optimization when using filters. Don't have JIRA numbers
here, but probably half-dozen small tweaks were added to 0.92.

>I don't have an increment workload (the workload either update columns on
>CF or add column on a CF for the same key), so how those patch will help?

Increment & read->update workload end up roughly picking up the same
optimizations.  Adding a column to an existing row is no different than
adding a new row as far as optimizations are concerned because there's
nothing to de-dupe.

>I don't say this is a bad thing, this is just an observation from our
>HBase will slow down the flush in case too many store file are present,
>will add pressure on GC and memory affecting performance.
>The update workload does not send all the row content for a certain key so
>only partial data is written, in order to get all the row i presume that
>reading the newest Store is not enough ("all" stores need to be read
>collecting the more up to date field a rebuild a full row), or i'm missing

Reading all row columns is the same as doing a scan.  You're not doing a
point query if you don't specify the exact key (columns) you're looking
for.  Setting versions to unlimited, then getting all versions of a
particular ROW+COL would also be considered a scan vs a point query as far
as optimizations are concerned.

>1. If i did not set a specific property for bloom filter (BF), does it
>means that i'm not using them (the book only refer to BF with regards to

By default, bloom filters are disabled, so you need to enable them to get
the optimizations.  This is by design.  Bloom Filters trade off cache
space for low-overhead probabilistic queries.  Default is 8-bytes per
bloom entry (key) & 1% false positive rate.  You can use 'bin/hbase
org.apache.hadoop.hbase.io.hfile.HFile' (look at help, then -f to specify
a StoreFile and then use -m for meta info) to see your StoreFile's average
KV size.  If size(KV) == 100 bytes, then blooms use 8% of the space in
cache, which is better than loading the StoreFile block only to get a miss.

Whether to use a ROW or ROWCOL bloom filter depends on your write & read
pattern.  If you read the entire row at a time, use a ROW bloom.  If you
point query, ROW or ROWCOL are both options.  If you write all columns for
a row at the same time, definitely use a ROW bloom.  If you have a small
column range and you update them at different rates/times, then a ROWCOL
bloom filter may be more helpful.  ROWCOL is really useful if a scan query
for a ROW will normally return results, but a point query for a ROWCOL may
have a high miss rate.  A perfect example is storing unique hash-values
for a user on disk.  You'd use 'user' as the row & the hash as the column.
 Most instances, the hash won't be a duplicate, so a ROWCOL bloom would be

>3. How can we ensure that compaction will not suck too much I/O if we
>cannot control major compaction?

TCP Congestion Control will ensure that a single TCP socket won't consume
too much bandwidth, so that part of compactions is automatically handled.
The part that you need to handle is the number of simultaneous TCP sockets
(currently 1 until multi-threaded compactions) & the aggregate data volume
transferred over time.  As I said, this is controlled by compaction.ratio.
 If temporary high StoreFile counts cause you to bottleneck, slight
latency variance is an annoyance of the current compaction algorithm but
the underlying problem you should be looking at solving is the system's
inability to filter out the unnecessary StoreFiles.

View raw message