hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Enis Soztutar (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction
Date Thu, 18 Feb 2016 01:14:18 GMT

    [ https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15151531#comment-15151531

Enis Soztutar commented on HBASE-15181:

[~claraxiong] this is great work BTW. Thanks for pushing for this. 

I just wanted to bring one open item back to jira to see whether ordering files with timestamps,
rather than seqid, and doing non-contiguous is acceptable: 
The tiered structure is built completely and solely on the data timestamp of the store files.
We cannot sort by segId at all. Any logic for updates/deletes depending on seqId would break.
The user needs to guarantee updates or deletes are in order aligned with time stamp order.
This compaction policy is pluggable and this limitation will be lifted if the work to allow
compaction out of order of seqId is done. As you pointed out in the ticket: "What I was saying
offline is that we can actually do something like HBASE-9905 and disallow client-settable
timestamps, or do something like HBASE-10247 where the table pre-declares that we won't have
same-ts edits, it should be possible to do non-contigous compactions."

Given that there is no hard-guarantees as of now about whether the client can do out of order
timestamp writes, can we still always be correct, but if the client does an excessive amount
of these writes, the compaction will not perform as efficiently. Basically, if we can, I would
like a system where the client will get the full benefit automatically if the timestamps follow
seqId order, but if not, the results are still correct. If there are occasional out-of-order
writes, the performance is not that badly affected, if not, the compaction algorithm can behave

I think we can achieve this with something like this: 
 - Use max ts as in the design for store files. 
 - Instead of ordering files by decreasing ts, order files by decreasing seqId. 
 - Iterating from highest seqId to lowest, find the tier that the file belongs to using maxTs.
The only difference from the current algorithm is that in the iteration, we should always
assign tiers in increasing order t0, t1, t2. This means that if out of order data is present,
and we end up with flushes where maxTs is very old, lets say it falls into t2, then t1 and
t0 would be empty and all files will be t2+. Otherwise (if you do not have out of order writes,
or have them occasionally) the behavior will be the same as in the design. 

Alternatively HFiles also have CREATE_TIME_TS, which is different than maxTimestamp. maxTS
comes from the user data, while hfile create time is the system time at the time of hfile
writing. If we do the tier selection based on hfile time instead of users maxTs, then we might
not even have that problem. Again, if there is actual correlation of user's timestamps with
the seqIds (or hfile create times), you would get all the benefits, otherwise, we would still
return the correct results, but compaction may not be optimal (I think it will be like falling
back to exploring one). Anyway, just a suggestion to consider. I might not have thought of
all corner cases. 

You are saying that this patch is also in production. Are there any numbers you've collected?

> A simple implementation of date based tiered compaction
> -------------------------------------------------------
>                 Key: HBASE-15181
>                 URL: https://issues.apache.org/jira/browse/HBASE-15181
>             Project: HBase
>          Issue Type: New Feature
>          Components: Compaction
>            Reporter: Clara Xiong
>            Assignee: Clara Xiong
>             Fix For: 2.0.0, 1.3.0, 0.98.19
>         Attachments: HBASE-15181-v1.patch, HBASE-15181-v2.patch
> This is a simple implementation of date-based tiered compaction similar to Cassandra's
for the following benefits:
> 1. Improve date-range-based scan by structuring store files in date-based tiered layout.
> 2. Reduce compaction overhead.
> 3. Improve TTL efficiency.
> Perfect fit for the use cases that:
> 1. has mostly date-based date write and scan and a focus on the most recent data. 
> 2. never or rarely deletes data.
> Out-of-order writes are handled gracefully so the data will still get to the right store
file for time-range-scan and re-compacton with existing store file in the same time window
is handled by ExploringCompactionPolicy.
> Time range overlapping among store files is tolerated and the performance impact is minimized.
> Configuration can be set at hbase-site or overriden at per-table or per-column-famly
level by hbase shell.
> Design spec is at https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing

This message was sent by Atlassian JIRA

View raw message