hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "zhoushuaifeng (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-3969) Outdated data can not be cleaned in time
Date Fri, 17 Jun 2011 01:46:47 GMT

    [ https://issues.apache.org/jira/browse/HBASE-3969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050850#comment-13050850

zhoushuaifeng commented on HBASE-3969:

Hi st, I don't do any deletes, I only set the TTL of the table to a few days.  When data's
timestamps are old than the TTL, these data should be cleaned by a major compact. But if the
region have no new data inserted for a while, there would be only 1 or 2 files in it, so the
priority is very low. If there is large  through output, major compact will be delayed.
About the ycsb, we have done some change on it, it can load data as we want(the key, value,
speed is all customized). THe scan is randomly, so it have a chance on the regions lots of
data outdated but haven't cleaned intime.
I think there is no need to check for hbase.hstore.blockingStoreFiles > hbase.hstore.compactionThreshold,
because the priority can be negative. When there are more files in the store than blockingStoreFiles,
the flush operation will triger a compact, and the priority will be negative, we have seen
that before. The negitive priority means that there are too many files in the store, and the
flushing of this store may be blocked for at most 90 seconds. We have seen this in the logs
And also, the user mustn't set hbase.hstore.blockingStoreFiles <= hbase.hstore.compactionThreshold,
if so, blocking will aways happen.

> Outdated data can not be cleaned in time
> ----------------------------------------
>                 Key: HBASE-3969
>                 URL: https://issues.apache.org/jira/browse/HBASE-3969
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver
>    Affects Versions: 0.90.1, 0.90.2, 0.90.3
>            Reporter: zhoushuaifeng
>             Fix For: 0.90.4
>         Attachments: HBASE-3969-solution1-for-branch.patch, HBASE-3969-solution1.patch
> Compaction checker will send regions to the compact queue to do compact. But the priority
of these regions is too low if these regions have only a few storefiles. When there is large
through output, and the compact queue will aways have some regions with higher priority. This
may causing the major compact be delayed for a long time(even a few days),  and outdated data
cleaning will also be delayed.
> In our test case, we found some regions sent to the queue by major compact checker hunging
in the queue for more than 2 days! Some scanners on these regions cannot get availably data
for a long time and lease expired.

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message