hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jimmy Hu (JIRA)" <j...@apache.org>
Subject [jira] Created: (HBASE-2999) hbase TTL behavior depends on key design
Date Wed, 15 Sep 2010 18:23:32 GMT
hbase TTL behavior depends on key design
----------------------------------------

                 Key: HBASE-2999
                 URL: https://issues.apache.org/jira/browse/HBASE-2999
             Project: HBase
          Issue Type: Bug
          Components: master
    Affects Versions: 0.89.20100621
         Environment: All
            Reporter: Jimmy Hu


Yes, Current TTL based on compaction is working as advertised if the key 
randomly distribute the incoming data
among all regions.  However, if the key is designed in chronological order, 
the TTL doesn't really work, as  no compaction
will happen for data already written. So we can't say  that current TTL 
really work as advertised, as it is key structure dependent.

This is a pity, because a major use case for hbase is for people to store 
history or log data. normally people only
want to retain the data for a fixed period. for example, US government 
default data retention policy is 7 years. Those
data are saved in chronological order. Current TTL implementation doesn't 
work at all for those kind of use case.

In order for that use case to really work, hbase needs to have an active 
thread that periodically runs and check if there
are data older than TTL, and delete the data older than TTL is necessary, 
and compact small regions older than certain time period
into larger ones to save system resource. It can optimize the deletion by 
delete the whole region if it detects that the last time
stamp for the region is older than TTL.  There should be 2 parameters  to 
configure for hbase:

1. whether to disable/enable the TTL thread.
2. the interval that TTL will run. maybe we can use a special value like 0 
to indicate that we don't run the TTL thread, thus saving one configuration 
parameter.
for the default TTL, probably it should be set to 1 day.
3. How small will the region be merged. it should be a percentage of the 
store size. for example, if 2 consecutive region is only 10% of the store 
szie ( default is 256M), we can initiate a region merge.  We probably need a 
parameter to reduce the merge too. for example , we only merge for regions 
who's largest timestamp
is older than half of TTL.

We are tracking min/max timestamps in storefiles currently, so it's possible that we could
expire some files of a region as well, even if the region was not completely expired. So At
minimum, we should be able to implement dropping  the stores that is older than TTL. if all
stores for a region is dropped, we should drop the whole region,
and update the key range of the adjacent region, so there is not a key hole left.



 


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message