Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id E9EAC200D41 for ; Wed, 8 Nov 2017 04:33:08 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id E86CF160C00; Wed, 8 Nov 2017 03:33:08 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 3AC8E160BED for ; Wed, 8 Nov 2017 04:33:08 +0100 (CET) Received: (qmail 53110 invoked by uid 500); 8 Nov 2017 03:33:07 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 53099 invoked by uid 99); 8 Nov 2017 03:33:07 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 08 Nov 2017 03:33:07 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 43253CE9B2 for ; Wed, 8 Nov 2017 03:33:06 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -99.202 X-Spam-Level: X-Spam-Status: No, score=-99.202 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id GKeYvz0GafT7 for ; Wed, 8 Nov 2017 03:33:03 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id A8007632B4 for ; Wed, 8 Nov 2017 03:33:03 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 0C36AE0295 for ; Wed, 8 Nov 2017 03:33:01 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 1A15C23F05 for ; Wed, 8 Nov 2017 03:33:00 +0000 (UTC) Date: Wed, 8 Nov 2017 03:33:00 +0000 (UTC) From: "Reid Chan (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Comment Edited] (HBASE-18309) Support multi threads in CleanerChore MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Wed, 08 Nov 2017 03:33:09 -0000 [ https://issues.apache.org/jira/browse/HBASE-18309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16243332#comment-16243332 ] Reid Chan edited comment on HBASE-18309 at 11/8/17 3:32 AM: ------------------------------------------------------------ bq. for a reasonable test please use a larger scale and include your reasoning, 10 doesn't seem like enough to simulate what will happen in a deployment. e.g. X regions per server, Y servers means Z directories to clean up. No need to use a real large scale, it can be simulated by creating 1000 sub dirs under root dir, and each sub dirs contains up to 1000 files and sub dirs. WDYT? I will provide statistics later. bq. At what point will tuning this parameter cause a NameNode to fall over? How do we stop folks from doing that accidentally? I'm not sure, and that's why parameter upper limit is machine's available cores. But observation from my production cluster(1000+ nodes) NameNode(24 cores) running for months and dealing with hundreds of jobs with deletion and creation every day shows that it is not easy for cleaner chore to get that achievement, XD. And i would suggest to set it less than or equals to NameNodes's core number for safety concern. bq. These details should probably be in the documentation about the config. Get it, i will also write it in hbase-default.xml with description if required. was (Author: reidchan): bq. for a reasonable test please use a larger scale and include your reasoning, 10 doesn't seem like enough to simulate what will happen in a deployment. e.g. X regions per server, Y servers means Z directories to clean up. No need to use a real large scale, it can be simulated by creating 1000 sub dirs under root dir, and each sub dirs contains up to 1000 files and sub dirs. WDYT? I will provide statistics later. bq. At what point will tuning this parameter cause a NameNode to fall over? How do we stop folks from doing that accidentally? I'm not sure, and that's why parameter upper limit is machine's available cores. But observation from my production cluster(1000+ nodes) NameNode(24 cores) running for months and dealing with hundreds of jobs with deletion and creation every day shows that it is not easy for cleaner chore to get that achievement, XD. And i would suggest to set it less than or equals to NameNodes's core number for safety concern. bq. These details should probably be in the documentation about the config. Get it, i will also write it in hbase-default.xml with description. > Support multi threads in CleanerChore > ------------------------------------- > > Key: HBASE-18309 > URL: https://issues.apache.org/jira/browse/HBASE-18309 > Project: HBase > Issue Type: Improvement > Components: wal > Reporter: binlijin > Assignee: Reid Chan > Attachments: HBASE-18309.master.001.patch, HBASE-18309.master.002.patch > > > There is only one thread in LogCleaner to clean oldWALs and in our big cluster we find this is not enough. The number of files under oldWALs reach the max-directory-items limit of HDFS and cause region server crash, so we use multi threads for LogCleaner and the crash not happened any more. -- This message was sent by Atlassian JIRA (v6.4.14#64029)