hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vladimir Rodionov (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-14468) Compaction improvements: FIFO compaction policy
Date Sun, 04 Oct 2015 19:42:26 GMT

    [ https://issues.apache.org/jira/browse/HBASE-14468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14942782#comment-14942782
] 

Vladimir Rodionov commented on HBASE-14468:
-------------------------------------------

{quote}
Yes, it looks like we can achieve FIFO behavior by using existing ExlporingCompactionPolicy.
We have to set CF TTL, disable periodic major compactions and set minimum files to compact
to very large value. But even if it works, I would prefer to use separate policy - it is self
explaining, at least
{quote}

No, we can't, because ExlporingCompactionPolicy always checks if # of store files is greater
than minimum number of files to compact and if it less than, than no compaction is requested.
Therefore we can't increase minimum files to compact to very large value and we need separate
compaction policy for FIFO style of compaction.

> Compaction improvements: FIFO compaction policy
> -----------------------------------------------
>
>                 Key: HBASE-14468
>                 URL: https://issues.apache.org/jira/browse/HBASE-14468
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Vladimir Rodionov
>            Assignee: Vladimir Rodionov
>             Fix For: 2.0.0
>
>         Attachments: HBASE-14468-v1.patch, HBASE-14468-v2.patch, HBASE-14468-v3.patch,
HBASE-14468-v4.patch
>
>
> h2. FIFO Compaction
> h3. Introduction
> FIFO compaction policy selects only files which have all cells expired. The column family
MUST have non-default TTL. 
> Essentially, FIFO compactor does only one job: collects expired store files. I see many
applications for this policy:
> # use it for very high volume raw data which has low TTL and which is the source of another
data (after additional processing). Example: Raw time-series vs. time-based rollup aggregates
and compacted time-series. We collect raw time-series and store them into CF with FIFO compaction
policy, periodically we run  task which creates rollup aggregates and compacts time-series,
the original raw data can be discarded after that.
> # use it for data which can be kept entirely in a a block cache (RAM/SSD). Say we have
local SSD (1TB) which we can use as a block cache. No need for compaction of a raw data at
all.
> Because we do not do any real compaction, we do not use CPU and IO (disk and network),
we do not evict hot data from a block cache. The result: improved throughput and latency both
write and read.
> See: https://github.com/facebook/rocksdb/wiki/FIFO-compaction-style
> h3. To enable FIFO compaction policy
> For table:
> {code}
> HTableDescriptor desc = new HTableDescriptor(tableName);
>     desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>       FIFOCompactionPolicy.class.getName());
> {code} 
> For CF:
> {code}
> HColumnDescriptor desc = new HColumnDescriptor(family);
>     desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>       FIFOCompactionPolicy.class.getName());
> {code}
> h3. Limitations
> Do not use FIFO compaction if :
> * Table/CF has MIN_VERSION > 0
> * Table/CF has TTL = FOREVER (HColumnDescriptor.DEFAULT_TTL)
> * Table/CF is MOB 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message