Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Date: Wed, 8 Nov 2017 03:33:00 +0000 (UTC)
From: "Reid Chan (JIRA)" <jira@apache.org>
To: issues@hbase.apache.org
Message-ID: <JIRA.13084183.1499071404000.177650.1510111980149@Atlassian.JIRA>
In-Reply-To: <JIRA.13084183.1499071404000@Atlassian.JIRA>
References: <JIRA.13084183.1499071404000@Atlassian.JIRA> <JIRA.13084183.1499071404530@jira-lw-us.apache.org>
Subject: [jira] [Comment Edited] (HBASE-18309) Support multi threads in
 CleanerChore
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
archived-at: Wed, 08 Nov 2017 03:33:09 -0000


    [ https://issues.apache.org/jira/browse/HBASE-18309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16243332#comment-16243332 ] 

Reid Chan edited comment on HBASE-18309 at 11/8/17 3:32 AM:
------------------------------------------------------------

bq. for a reasonable test please use a larger scale and include your reasoning, 10 doesn't seem like enough to simulate what will happen in a deployment. e.g. X regions per server, Y servers means Z directories to clean up.
No need to use a real large scale, it can be simulated by creating 1000 sub dirs under root dir, and each sub dirs contains up to 1000 files and sub dirs. WDYT? I will provide statistics later.
bq. At what point will tuning this parameter cause a NameNode to fall over? How do we stop folks from doing that accidentally?
I'm not sure, and that's why parameter upper limit is machine's available cores. But observation from my production cluster(1000+ nodes) NameNode(24 cores) running for months and dealing with hundreds of jobs with deletion and creation every day shows that it is not easy for cleaner chore to get that achievement, XD. And i would suggest to set it less than or equals to NameNodes's core number for safety concern.
bq. These details should probably be in the documentation about the config.
Get it, i will also write it in hbase-default.xml with description if required.


was (Author: reidchan):
bq. for a reasonable test please use a larger scale and include your reasoning, 10 doesn't seem like enough to simulate what will happen in a deployment. e.g. X regions per server, Y servers means Z directories to clean up.
No need to use a real large scale, it can be simulated by creating 1000 sub dirs under root dir, and each sub dirs contains up to 1000 files and sub dirs. WDYT? I will provide statistics later.
bq. At what point will tuning this parameter cause a NameNode to fall over? How do we stop folks from doing that accidentally?
I'm not sure, and that's why parameter upper limit is machine's available cores. But observation from my production cluster(1000+ nodes) NameNode(24 cores) running for months and dealing with hundreds of jobs with deletion and creation every day shows that it is not easy for cleaner chore to get that achievement, XD. And i would suggest to set it less than or equals to NameNodes's core number for safety concern.
bq. These details should probably be in the documentation about the config.
Get it, i will also write it in hbase-default.xml with description.

> Support multi threads in CleanerChore
> -------------------------------------
>
>                 Key: HBASE-18309
>                 URL: https://issues.apache.org/jira/browse/HBASE-18309
>             Project: HBase
>          Issue Type: Improvement
>          Components: wal
>            Reporter: binlijin
>            Assignee: Reid Chan
>         Attachments: HBASE-18309.master.001.patch, HBASE-18309.master.002.patch
>
>
> There is only one thread in LogCleaner to clean oldWALs and in our big cluster we find this is not enough. The number of files under oldWALs reach the max-directory-items limit of HDFS and cause region server crash, so we use multi threads for LogCleaner and the crash not happened any more.


--
This message was sent by Atlassian JIRA
(v6.4.14#64029)