hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dave Latham (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HBASE-15181) A simple implementation of date based tiered compaction
Date Wed, 02 Mar 2016 17:01:18 GMT

     [ https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Dave Latham updated HBASE-15181:
--------------------------------
    Release Note: 
Date tiered compaction policy is a date-aware store file layout that is beneficial for time-range
scans for time-series data.

When it performs well:

    reads for limited time ranges, especially scans of recent data

When it doesn't perform as well:

    random gets without a time range
    frequent deletes and updates
    out of order data writes, especially writes with timestamps in the future
    bulk loads of historical data

Recommended configuration:
To turn on Date Tiered Compaction:
hbase.hstore.compaction.compaction.policy: org.apache.hadoop.hbase.regionserver.compactions.DateTieredCompactionPolicy

Parameters for Date Tiered Compaction:
hbase.hstore.compaction.date.tiered.max.storefile.age.millis: Files with max-timestamp smaller
than this will no longer be compacted.Default at Long.MAX_VALUE.
hbase.hstore.compaction.date.tiered.base.window.millis: base window size in milliseconds.
Default at 6 hours.
hbase.hstore.compaction.date.tiered.windows.per.tier: number of windows per tier. Default
at 4.
hbase.hstore.compaction.date.tiered.incoming.window.min: minimal number of files to compact
in the incoming window. Set it to expected number of files in the window to avoid wasteful
compaction. Default at 6.
hbase.hstore.compaction.date.tiered.window.policy.class: the policy to select store files
within the same time window. It doesn’t apply to the incoming window. Default at exploring
compaction. This is to avoid wasteful compaction.

With tiered compaction all servers in the cluster will promote windows to higher tier at the
same time, so using a compaction throttle is recommended:
hbase.regionserver.throughput.controller:org.apache.hadoop.hbase.regionserver.compactions.PressureAwareCompactionThroughputController

Because there will most likely be more store files around, we need to adjust the configuration
so that flush won't be blocked and compaction will be properly throttled:
hbase.hstore.blockingStoreFiles: change to 50 if using all default parameters when turning
on date tiered compaction. Use 1.5~2 x projected file count if changing the parameters, Projected
file count = windows per tier x tier count + incoming window min + files older than max age

For more details, please refer to the design spec at https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit#

> A simple implementation of date based tiered compaction
> -------------------------------------------------------
>
>                 Key: HBASE-15181
>                 URL: https://issues.apache.org/jira/browse/HBASE-15181
>             Project: HBase
>          Issue Type: New Feature
>          Components: Compaction
>            Reporter: Clara Xiong
>            Assignee: Clara Xiong
>             Fix For: 2.0.0, 1.3.0, 0.98.19, 1.4.0
>
>         Attachments: HBASE-15181-0.98-ADD.patch, HBASE-15181-0.98.patch, HBASE-15181-0.98.v4.patch,
HBASE-15181-98.patch, HBASE-15181-ADD.patch, HBASE-15181-branch-1.patch, HBASE-15181-master-v1.patch,
HBASE-15181-master-v2.patch, HBASE-15181-master-v3.patch, HBASE-15181-master-v4.patch, HBASE-15181-v1.patch,
HBASE-15181-v2.patch
>
>
> This is a simple implementation of date-based tiered compaction similar to Cassandra's
for the following benefits:
> 1. Improve date-range-based scan by structuring store files in date-based tiered layout.
> 2. Reduce compaction overhead.
> 3. Improve TTL efficiency.
> Perfect fit for the use cases that:
> 1. has mostly date-based date write and scan and a focus on the most recent data. 
> 2. never or rarely deletes data.
> Out-of-order writes are handled gracefully. Time range overlapping among store files
is tolerated and the performance impact is minimized.
> Configuration can be set at hbase-site.xml or overriden at per-table or per-column-famly
level by hbase shell.
> Design spec is at https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message