hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bryan Duxbury (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-47) option to set TTL for columns in hbase
Date Thu, 01 May 2008 06:08:55 GMT

    [ https://issues.apache.org/jira/browse/HBASE-47?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12593494#action_12593494
] 

Bryan Duxbury commented on HBASE-47:
------------------------------------

HColumnDescriptor:
 * Update HColumnDescriptor's javadoc to reference HConstants.FOREVER instead of the actual
raw value.

Memcache:
 * Instead of having memcache take an HStore, could we just make it take a TTL? A memcache
is only ever for a single column family, and TTL is configured at the family level. I just
don't like the "if (store != null)" stuff that it does now. The default constructor can just
pass a FOREVER TTL to the other constructor. 
 * Instead of doing the math to turn TTL seconds into milleseconds during every call, can
we move it into the constructor?
 * Line 314: even though it's only a debug message, you should wrap the line with "if (LOG.isDebugEnabled()
{...}". Even though the line won't make it out to the log, the string will still get evaluated,
and we'd like to avoid that when we're not in debug.

HStore
 * Same as above regarding TTL seconds to milleseconds.

Some small items:
 * There are tabs in this patch - please replace them with two spaces.
 * I like the test, but I'm loath to include another 70 seconds in the test suite. Can you
put the values in the past and see if they are immediately screened out? That would make it
a very fast test.

Overall I really like this patch. This is an impressive feature to take on and it's been done
admirably. Thanks Andrew!
 

> option to set TTL for columns in hbase
> --------------------------------------
>
>                 Key: HBASE-47
>                 URL: https://issues.apache.org/jira/browse/HBASE-47
>             Project: Hadoop HBase
>          Issue Type: New Feature
>          Components: hql, regionserver
>            Reporter: Billy Pearson
>            Priority: Minor
>         Attachments: hbase-ttl-0.2-r652401.patch
>
>
> I would like to see the option to have a TTL on the columns in hbase this feature could
be helpfully in removing stale data from large datasets with out havening to do a full scan
of the dataset and then issuing deletes.
> Example 
> Say I am crawling pages and only refreshing pages based on a set score and some pages
doe not get updated over X days the old version of the page gets removed from the data set.

> Say I am striping out links form html and storing them say a link is removed from a page
then I would need to issue a delete statement to remove that links form the data set with
a ttl the link data would remove its self if not updated in x secs. These are just examples
based on crawling like nutch but I can foresee many apps using this option. 
> This is a feature in bigtables thats is handled when bigtable does garbage-collection.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message