hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vladimir Rodionov (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-14468) Compaction improvements: FIFO compaction policy
Date Tue, 05 Jan 2016 19:48:40 GMT

    [ https://issues.apache.org/jira/browse/HBASE-14468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15083657#comment-15083657
] 

Vladimir Rodionov commented on HBASE-14468:
-------------------------------------------

[~lhofhansl]

I think you should backport HBASE-10141 to 0.98 as well? I am going to mark HBASE-14467 as
invalid.

> Compaction improvements: FIFO compaction policy
> -----------------------------------------------
>
>                 Key: HBASE-14468
>                 URL: https://issues.apache.org/jira/browse/HBASE-14468
>             Project: HBase
>          Issue Type: Improvement
>          Components: Compaction, Performance
>            Reporter: Vladimir Rodionov
>            Assignee: Vladimir Rodionov
>             Fix For: 2.0.0, 1.2.0, 1.3.0
>
>         Attachments: 14468-0.98-v2.txt, 14468-0.98.txt, HBASE-14468-v1.patch, HBASE-14468-v10.patch,
HBASE-14468-v2.patch, HBASE-14468-v3.patch, HBASE-14468-v4.patch, HBASE-14468-v5.patch, HBASE-14468-v6.patch,
HBASE-14468-v7.patch, HBASE-14468-v8.patch, HBASE-14468-v9.patch, HBASE-14468.add.patch
>
>
> h2. FIFO Compaction
> h3. Introduction
> FIFO compaction policy selects only files which have all cells expired. The column family
MUST have non-default TTL. 
> Essentially, FIFO compactor does only one job: collects expired store files. These are
some applications which could benefit the most:
> # Use it for very high volume raw data which has low TTL and which is the source of another
data (after additional processing). Example: Raw time-series vs. time-based rollup aggregates
and compacted time-series. We collect raw time-series and store them into CF with FIFO compaction
policy, periodically we run  task which creates rollup aggregates and compacts time-series,
the original raw data can be discarded after that.
> # Use it for data which can be kept entirely in a a block cache (RAM/SSD). Say we have
local SSD (1TB) which we can use as a block cache. No need for compaction of a raw data at
all.
> Because we do not do any real compaction, we do not use CPU and IO (disk and network),
we do not evict hot data from a block cache. The result: improved throughput and latency both
write and read.
> See: https://github.com/facebook/rocksdb/wiki/FIFO-compaction-style
> h3. To enable FIFO compaction policy
> For table:
> {code}
> HTableDescriptor desc = new HTableDescriptor(tableName);
>     desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>       FIFOCompactionPolicy.class.getName());
> {code} 
> For CF:
> {code}
> HColumnDescriptor desc = new HColumnDescriptor(family);
>     desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>       FIFOCompactionPolicy.class.getName());
> {code}
> Although region splitting is supported,  for optimal performance it should be disabled,
either by setting explicitly DisabledRegionSplitPolicy or by setting ConstantSizeRegionSplitPolicy
and very large max region size. You will have to increase to a very large number store's blocking
file number : *hbase.hstore.blockingStoreFiles* as well.
>  
> h3. Limitations
> Do not use FIFO compaction if :
> * Table/CF has MIN_VERSION > 0
> * Table/CF has TTL = FOREVER (HColumnDescriptor.DEFAULT_TTL)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message