hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Enis Soztutar (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-14468) Compaction improvements: FIFO compaction policy
Date Wed, 28 Oct 2015 03:06:27 GMT

    [ https://issues.apache.org/jira/browse/HBASE-14468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14977632#comment-14977632

Enis Soztutar commented on HBASE-14468:

This is a good idea. We should add this to the list of compaction policies with good documentation.
We have use cases where there is a TTL of a couple of days. Metrics store is one such example
for the raw data in a high ingest scenario. 

For the patch itself, the first if is not needed if we are checking for the DisabledRSP anyway:

+    if(splitPolicyClassName.equals(IncreasingToUpperBoundRegionSplitPolicy.class.getName())){
+      throw new RuntimeException("Default split policy for FIFO compaction"+
+          " is not supported, aborting.");
+    } else if( !splitPolicyClassName.equals(DisabledRegionSplitPolicy.class.getName())){
+      warn.append(":region splits must be disabled:");
+    } 

Can we make it so that if a split happens we still compact the reference files, but we do
not compact otherwise? We can also allow very-slow splits in the case where the reference
files will be cleaned out due to TTL. In this case, a region can still split every TTL interval.

RuntimeException's thrown will cause region opening to fail or RS to abort? Can we hook the
verify code to {{HMaster.sanityCheckTableDescriptor()}}, so that you cannot alter the table
or create a table with those settings. This will make a much better experience for the user.

Can we also simplify the configuration for these. Maybe we auto-disable the major compactions,
and set the blocking store files if they are not set? 

Can we use HStore.removeUnneededFiles() or {{storeEngine.getStoreFileManager()}} which already
implements the is expired logic so that there is no duplication there? 

> Compaction improvements: FIFO compaction policy
> -----------------------------------------------
>                 Key: HBASE-14468
>                 URL: https://issues.apache.org/jira/browse/HBASE-14468
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Vladimir Rodionov
>            Assignee: Vladimir Rodionov
>             Fix For: 2.0.0
>         Attachments: HBASE-14468-v1.patch, HBASE-14468-v2.patch, HBASE-14468-v3.patch,
HBASE-14468-v4.patch, HBASE-14468-v5.patch, HBASE-14468-v6.patch
> h2. FIFO Compaction
> h3. Introduction
> FIFO compaction policy selects only files which have all cells expired. The column family
MUST have non-default TTL. 
> Essentially, FIFO compactor does only one job: collects expired store files. I see many
applications for this policy:
> # use it for very high volume raw data which has low TTL and which is the source of another
data (after additional processing). Example: Raw time-series vs. time-based rollup aggregates
and compacted time-series. We collect raw time-series and store them into CF with FIFO compaction
policy, periodically we run  task which creates rollup aggregates and compacts time-series,
the original raw data can be discarded after that.
> # use it for data which can be kept entirely in a a block cache (RAM/SSD). Say we have
local SSD (1TB) which we can use as a block cache. No need for compaction of a raw data at
> Because we do not do any real compaction, we do not use CPU and IO (disk and network),
we do not evict hot data from a block cache. The result: improved throughput and latency both
write and read.
> See: https://github.com/facebook/rocksdb/wiki/FIFO-compaction-style
> h3. To enable FIFO compaction policy
> For table:
> {code}
> HTableDescriptor desc = new HTableDescriptor(tableName);
>     desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>       FIFOCompactionPolicy.class.getName());
> {code} 
> For CF:
> {code}
> HColumnDescriptor desc = new HColumnDescriptor(family);
>     desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>       FIFOCompactionPolicy.class.getName());
> {code}
> Make sure, that table has disabled region splits (either by setting explicitly DisabledRegionSplitPolicy
or by setting ConstantSizeRegionSplitPolicy and very large max region size). You will have
to increase to a very large number store's blocking file number : *hbase.hstore.blockingStoreFiles*
as well.
> h3. Limitations
> Do not use FIFO compaction if :
> * Table/CF has MIN_VERSION > 0
> * Table/CF has TTL = FOREVER (HColumnDescriptor.DEFAULT_TTL)

This message was sent by Atlassian JIRA

View raw message