Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 9410E200C25 for ; Fri, 24 Feb 2017 20:23:48 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 92B35160B69; Fri, 24 Feb 2017 19:23:48 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id DDD14160B62 for ; Fri, 24 Feb 2017 20:23:47 +0100 (CET) Received: (qmail 67057 invoked by uid 500); 24 Feb 2017 19:23:47 -0000 Mailing-List: contact issues-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list issues@hive.apache.org Received: (qmail 67047 invoked by uid 99); 24 Feb 2017 19:23:47 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 24 Feb 2017 19:23:47 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 972A4C0115 for ; Fri, 24 Feb 2017 19:23:46 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -2.347 X-Spam-Level: X-Spam-Status: No, score=-2.347 tagged_above=-999 required=6.31 tests=[RP_MATCHES_RCVD=-2.999, SPF_NEUTRAL=0.652] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id UvNsDhkY4RWa for ; Fri, 24 Feb 2017 19:23:45 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id 34B595F24C for ; Fri, 24 Feb 2017 19:23:45 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 6F1CDE0630 for ; Fri, 24 Feb 2017 19:23:44 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 2C2EB24134 for ; Fri, 24 Feb 2017 19:23:44 +0000 (UTC) Date: Fri, 24 Feb 2017 19:23:44 +0000 (UTC) From: "Vihang Karajgaonkar (JIRA)" To: issues@hive.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (HIVE-15879) Fix HiveMetaStoreChecker.checkPartitionDirs method MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Fri, 24 Feb 2017 19:23:48 -0000 [ https://issues.apache.org/jira/browse/HIVE-15879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vihang Karajgaonkar updated HIVE-15879: --------------------------------------- Attachment: HIVE-15879.04.patch > Fix HiveMetaStoreChecker.checkPartitionDirs method > -------------------------------------------------- > > Key: HIVE-15879 > URL: https://issues.apache.org/jira/browse/HIVE-15879 > Project: Hive > Issue Type: Bug > Reporter: Vihang Karajgaonkar > Assignee: Vihang Karajgaonkar > Attachments: HIVE-15879.01.patch, HIVE-15879.02.patch, HIVE-15879.03.patch, HIVE-15879.04.patch > > > HIVE-15803 fixes the msck hang issue in HiveMetaStoreChecker.checkPartitionDirs method by adding a check to see if the Threadpool has any spare threads. If not it uses single threaded listing of the files. > {noformat} > if (pool != null) { > synchronized (pool) { > // In case of recursive calls, it is possible to deadlock with TP. Check TP usage here. > if (pool.getActiveCount() < pool.getMaximumPoolSize()) { > useThreadPool = true; > } > if (!useThreadPool) { > if (LOG.isDebugEnabled()) { > LOG.debug("Not using threadPool as active count:" + pool.getActiveCount() > + ", max:" + pool.getMaximumPoolSize()); > } > } > } > } > {noformat} > Based on the java doc of getActiveCount() below > bq. Returns the approximate number of threads that are actively executing tasks. > it returns only approximate number of threads and it cannot be guaranteed that it always returns the exact number of active threads. This still exposes the method implementation to the msck hang bug in rare corner cases. > We could either: > 1. Use a atomic counter to track exactly how many threads are actively running > 2. Relook at the method itself to make it much simpler. Like eg, look into the possibility of changing the recursive implementation to an iterative implementation where worker threads pick tasks from a queue until the queue is empty. -- This message was sent by Atlassian JIRA (v6.3.15#6346)