Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 1CF2A200C49 for ; Fri, 17 Mar 2017 19:44:47 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 1B90E160B80; Fri, 17 Mar 2017 18:44:47 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 64D91160B70 for ; Fri, 17 Mar 2017 19:44:46 +0100 (CET) Received: (qmail 81412 invoked by uid 500); 17 Mar 2017 18:44:45 -0000 Mailing-List: contact issues-help@impala.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@impala.incubator.apache.org Delivered-To: mailing list issues@impala.incubator.apache.org Received: (qmail 81403 invoked by uid 99); 17 Mar 2017 18:44:45 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 17 Mar 2017 18:44:45 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 1189B18F15F for ; Fri, 17 Mar 2017 18:44:45 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.451 X-Spam-Level: * X-Spam-Status: No, score=1.451 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RP_MATCHES_RCVD=-0.001, SPF_NEUTRAL=0.652] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id EaPvU__a-3oX for ; Fri, 17 Mar 2017 18:44:44 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id DA0865FD00 for ; Fri, 17 Mar 2017 18:44:43 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 86305E08B9 for ; Fri, 17 Mar 2017 18:44:42 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id AC458254B9 for ; Fri, 17 Mar 2017 18:44:41 +0000 (UTC) Date: Fri, 17 Mar 2017 18:44:41 +0000 (UTC) From: "Alexander Behm (JIRA)" To: issues@impala.incubator.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (IMPALA-5090) Improve the logging of causes for "unknown disk id" including possible workarounds MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Fri, 17 Mar 2017 18:44:47 -0000 [ https://issues.apache.org/jira/browse/IMPALA-5090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Behm updated IMPALA-5090: ----------------------------------- Issue Type: Improvement (was: Bug) > Improve the logging of causes for "unknown disk id" including possible workarounds > ---------------------------------------------------------------------------------- > > Key: IMPALA-5090 > URL: https://issues.apache.org/jira/browse/IMPALA-5090 > Project: IMPALA > Issue Type: Improvement > Components: Catalog > Affects Versions: Impala 2.5.0, Impala 2.4.0, Impala 2.6.0, Impala 2.7.0, Impala 2.8.0 > Reporter: Alexander Behm > Priority: Critical > Labels: catalog-server, supportability > > A frequent cause of "unknown disk id" warnings during query execution is that at the time of table loading one of the DNs holding relevant data was overloaded and could not give a timely response to dfs.getFileBlockStorageLocations() calls from the CatalogServer. > You will find messages similar to this in the catalogd logs at the time of table loading: > {code} > I0315 07:30:49.752166 33660 BlockStorageLocationUtil.java:167] Cancelled while waiting for datanode 10.17.184.31:50020: java.util.concurrent.CancellationException > I0315 07:30:49.752351 33660 BlockStorageLocationUtil.java:167] Cancelled while waiting for datanode 10.17.184.32:50020: java.util.concurrent.CancellationException > I0315 07:30:49.752465 33660 BlockStorageLocationUtil.java:167] Cancelled while waiting for datanode 10.17.182.22:50020: java.util.concurrent.CancellationException > {code} > Also look for "Unknown disk id count for filesystem" in the catalogd logs to see how many missing disk ids were found in total. > This JIRA is for improving the error reporting dumped to the catalogd log when disk ids fail to load due to DN issues. In particular, the values for the following DN configuration options are often set pretty aggressively. > * dfs.datanode.handler.count > * dfs.client.file-block-storage-locations.timeout.millis > The logging should include the current setting of these configs and mention that increasing the might mitigate the disk id issues on a busy cluster. > In addition, we should consider enhancing the BE "unknown disk id" warning to include possible causes (heavy load on HDFS) and to recommend examining the catalogd logs for more information. > Note that this improvement is only relevant to Impala versions prior to IMPALA-4172 because after that change we no longer contact the DNs for disk ids. -- This message was sent by Atlassian JIRA (v6.3.15#6346)