hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-15454) Archive store files older than max age
Date Mon, 11 Apr 2016 21:25:25 GMT

    [ https://issues.apache.org/jira/browse/HBASE-15454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15236001#comment-15236001

stack commented on HBASE-15454:

I just read the design doc too and had [~davelatham] questions. 

I was just talking to a user and here's what they want (I was reading this issue thinking
I could get some of it here?)

 # Heavy write across the whole key-space. UUID for key.
 # Only data written in the last 30 days is likely to be viewed again.
 # Only in extremely rare cases, older data will be looked at. In this latter case, access
times can be 100x those of the 30day working set.
 # All queries are random reads.. no Scanning.

We chatted about a few options. I see here there is talk of archiving old data only in my
scenario, the data still needs to be accessible. One thought was a 'live' table and an 'archive'
table where 'live' table had configuration amenable to low-latency serving and then the archive
table would be configured to carry lots of data at the expense of being slow to read. What
I wanted was an atomic way of moving files > 30 days from one table to the other (close
region, move, open region is all we have).

Thinking some here, would be cool if there was a category of file that was 'cold'. These files
would not participate in compactions nor in region size accounting (to provent split) and
would be 'closed' normally so no resources consumed. If a seek came in for one of these files,
we'd open it, satisfy the seek, keep it open in case a new seek came in again but otherwise,
we'd let it close again. This or just an atomic means of moving files form online to cold
(table) storage.

> Archive store files older than max age
> --------------------------------------
>                 Key: HBASE-15454
>                 URL: https://issues.apache.org/jira/browse/HBASE-15454
>             Project: HBase
>          Issue Type: Sub-task
>          Components: Compaction
>    Affects Versions: 2.0.0, 1.3.0, 0.98.18, 1.4.0
>            Reporter: Duo Zhang
>            Assignee: Duo Zhang
>             Fix For: 2.0.0, 1.3.0, 0.98.19, 1.4.0
>         Attachments: HBASE-15454-v1.patch, HBASE-15454.patch
> Sometimes the old data is rarely touched but we can not remove it. So archive it to several
big files(by year or something) and use EC to reduce the redundancy.

This message was sent by Atlassian JIRA

View raw message