Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Date: Fri, 17 Jun 2011 01:46:47 +0000 (UTC)
From: "zhoushuaifeng (JIRA)" <jira@apache.org>
To: issues@hbase.apache.org
Message-ID: 
 <181333642.13677.1308275207599.JavaMail.tomcat@hel.zones.apache.org>
In-Reply-To: 
 <16087348.5779.1307604119018.JavaMail.tomcat@hel.zones.apache.org>
Subject: [jira] [Commented] (HBASE-3969) Outdated data can not be cleaned in
 time
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/HBASE-3969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050850#comment-13050850 ] 

zhoushuaifeng commented on HBASE-3969:
--------------------------------------

Hi st, I don't do any deletes, I only set the TTL of the table to a few days.  When data's timestamps are old than the TTL, these data should be cleaned by a major compact. But if the region have no new data inserted for a while, there would be only 1 or 2 files in it, so the priority is very low. If there is large  through output, major compact will be delayed.
About the ycsb, we have done some change on it, it can load data as we want(the key, value, speed is all customized). THe scan is randomly, so it have a chance on the regions lots of data outdated but haven't cleaned intime.
I think there is no need to check for hbase.hstore.blockingStoreFiles > hbase.hstore.compactionThreshold, because the priority can be negative. When there are more files in the store than blockingStoreFiles, the flush operation will triger a compact, and the priority will be negative, we have seen that before. The negitive priority means that there are too many files in the store, and the flushing of this store may be blocked for at most 90 seconds. We have seen this in the logs before.
And also, the user mustn't set hbase.hstore.blockingStoreFiles <= hbase.hstore.compactionThreshold, if so, blocking will aways happen.


> Outdated data can not be cleaned in time
> ----------------------------------------
>
>                 Key: HBASE-3969
>                 URL: https://issues.apache.org/jira/browse/HBASE-3969
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver
>    Affects Versions: 0.90.1, 0.90.2, 0.90.3
>            Reporter: zhoushuaifeng
>             Fix For: 0.90.4
>
>         Attachments: HBASE-3969-solution1-for-branch.patch, HBASE-3969-solution1.patch
>
>
> Compaction checker will send regions to the compact queue to do compact. But the priority of these regions is too low if these regions have only a few storefiles. When there is large through output, and the compact queue will aways have some regions with higher priority. This may causing the major compact be delayed for a long time(even a few days),  and outdated data cleaning will also be delayed.
> In our test case, we found some regions sent to the queue by major compact checker hunging in the queue for more than 2 days! Some scanners on these regions cannot get availably data for a long time and lease expired.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira