hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yu Li (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-17215) Separate small/large file delete threads in HFileCleaner to accelerate archived hfile cleanup speed
Date Thu, 30 Mar 2017 04:16:42 GMT

    [ https://issues.apache.org/jira/browse/HBASE-17215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15948399#comment-15948399

Yu Li commented on HBASE-17215:

bq. hbase.hfile.compaction.discharger.interval default to be lowered in ur view?
I'm not sure, since this {{hbase.hfile.compaction.discharger.interval}} is on regionserver-side
while {{hbase.master.cleaner.interval}} on master side. Considering multiple RS archiving
files, I think master should have a higher rate. Wdyt? [~anoop.hbase]

bq. This is not PCIe-SSD, for some reason that in archive...
I'm very much interested in this "some reason" (smile). Ever dig into it and mind share if
so? Thanks. [~huaxiang]

bq. I was thinking why would a thread be dedicated to cleaning up small files?
I simply borrowed the idea from compaction, where we have different threads dealing with different
size. And there're more small hfiles than large ones in common case, which also likes the
case in compaction. In our case the root cause of the issue was (truly) too many small files
blocked large file deletion thus slowed down the disk space free up, but is it possible that
too many large files blocking small ones although corner case? [~tedyu]

bq. can these two threads share a LinkedBlockingDeque?
Yes, we could use {{StealJobQueue}} directly referring to compaction, but mind if I open another
thread to do this improvement? The current implementation has been verified online so I'm
more sure about its stability. [~huaxiang]

> Separate small/large file delete threads in HFileCleaner to accelerate archived hfile
cleanup speed
> ---------------------------------------------------------------------------------------------------
>                 Key: HBASE-17215
>                 URL: https://issues.apache.org/jira/browse/HBASE-17215
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Yu Li
>            Assignee: Yu Li
>         Attachments: HBASE-17215.patch
> When using PCIe-SSD the flush speed will be really quick, and although we have per CF
flush, we still have the {{hbase.regionserver.optionalcacheflushinterval}} setting and some
other mechanism to avoid data kept in memory for too long to flush small hfiles. In our online
environment we found the single thread cleaner kept cleaning earlier flushed small files while
large files got no chance, which caused disk full then many other problems.
> Deleting hfiles in parallel with too many threads will also increase the workload of
namenode, so here we propose to separate large/small hfile cleaner threads just like we do
for compaction, and it turned out to work well in our cluster.

This message was sent by Atlassian JIRA

View raw message