hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zhoushuaifeng <zhoushuaif...@huawei.com>
Subject RE: Outdated data can not be cleaned in time
Date Thu, 09 Jun 2011 04:14:12 GMT
Hi St,
My comments are below, start with //zhou.

-----Original Message-----
From: saint.ack@gmail.com [mailto:saint.ack@gmail.com] On Behalf Of Stack
Sent: Thursday, June 09, 2011 11:32 AM
To: dev@hbase.apache.org
Cc: Yanlijun; Chenjian
Subject: Re: Outdated data can not be cleaned in time

On Tue, Jun 7, 2011 at 12:41 AM, Zhoushuaifeng <zhoushuaifeng@huawei.com> wrote:
> https://issues.apache.org/jira/browse/HBASE-3723
> This issue is fixed and Committed to TRUNK, but not integrated in to 0.90.2 and 0.90.3,
this will causing outdated data not be cleaned in time.

Let me commit to branch.  Its a small change.

Thanks, it's important. 

> For more, compaction checker will send regions to the compact queue to do compact. But
the priority of these regions is too low if these regions have only a few storefiles. When
there is large through output, and the compact queue will aways have some regions with higher
priority. This may causing the major compact be delayed for a long time(even a few days),
 and outdated data cleaning will also be delayed.
> If so , I suggested that the compaction checker sending regions need major compact to
the compact queue with higher priority.

I'd think that a region with more storefiles should take priority over
regions with a few files, even if these files are due for a major

//zhou: I agree regions with more files should take higher priority, but there are other factors
important should be considered. In our test case, we found some regions sent to the queue
by major compact checker hunging in the queue for more than 2 days! Some scanners on these
regions cannot get availably data for a long time and lease expired.
I think set these regions priority to hbase.hstore.blockingStoreFiles - hbase.hstore.compactionThreshold
-1 as default may be a good way to solve this problem. If regions have less than 3 files,
it's priority is lower than outdated regions, but if it has more than 4 files, it's priority
will be higher. This settings can both solve the outdated problem and will not block flush
and put. 

 I can understand that if there are a lot of deletes in a
store, a major compaction could make a big difference but do you think
this the usual case?

Maybe the compaction algorithm should consider age of compactions too?
 If a compaction has been hanging the queue a good while, its priority
gets bumped a level?
//zhou: This is good, I totally agree. This is another good way to solve the problem. But
by now, I don't know how to make the patch, may be we can dig more on it.


View raw message