hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ted Yu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-18084) Improve CleanerChore to clean from directory which consumes more disk space
Date Sun, 21 May 2017 08:20:04 GMT

    [ https://issues.apache.org/jira/browse/HBASE-18084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16018745#comment-16018745
] 

Ted Yu commented on HBASE-18084:
--------------------------------

bq. the fs.getContentSummary call is time consuming if there're many files in the directory

In the patch, we obtain size of all directories before performing cleaning on sorted directory
list.
Have you thought about having two threads doing sorting and cleaning in parallel :

thread 1 does sorting, it presents sorted directory list every N directories (in batches).
thread 2 does cleaning and updates the list as thread 1 provides new list (minus the directories
it has already cleaned)

The rationale behind the above design is to start cleaning without waiting for complete directory
list. It is fine to clean small directory in thread 2 because there is no time wasted in waiting
for the complete list to come out.

What do you think ?

> Improve CleanerChore to clean from directory which consumes more disk space
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-18084
>                 URL: https://issues.apache.org/jira/browse/HBASE-18084
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Yu Li
>            Assignee: Yu Li
>         Attachments: HBASE-18084.patch, HBASE-18084.v2.patch
>
>
> Currently CleanerChore cleans the directory in dictionary order, rather than from the
directory with largest space usage. And when data abnormally accumulated to some huge volume
in archive directory, the cleaning speed might not be enough.
> This proposal is another improvement working together with HBASE-18083 to resolve our
online issue (archive dir consumed more than 1.8PB SSD space)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message