impala-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alexander Behm (JIRA)" <j...@apache.org>
Subject [jira] [Created] (IMPALA-5090) Improve the logging of causes for "unknown disk id" including possible workarounds
Date Fri, 17 Mar 2017 18:42:41 GMT
Alexander Behm created IMPALA-5090:
--------------------------------------

             Summary: Improve the logging of causes for "unknown disk id" including possible
workarounds
                 Key: IMPALA-5090
                 URL: https://issues.apache.org/jira/browse/IMPALA-5090
             Project: IMPALA
          Issue Type: Bug
    Affects Versions: Impala 2.5.0, Impala 2.4.0, Impala 2.6.0, Impala 2.7.0, Impala 2.8.0
            Reporter: Alexander Behm
            Priority: Critical


A frequent cause of "unknown disk id" warnings during query execution is that at the time
of table loading one of the DNs holding relevant data was overloaded and could not give a
timely response to dfs.getFileBlockStorageLocations() calls from the CatalogServer.

You will find messages similar to this in the catalogd logs at the time of table loading:
{code}
I0315 07:30:49.752166 33660 BlockStorageLocationUtil.java:167] Cancelled while waiting for
datanode 10.17.184.31:50020: java.util.concurrent.CancellationException
I0315 07:30:49.752351 33660 BlockStorageLocationUtil.java:167] Cancelled while waiting for
datanode 10.17.184.32:50020: java.util.concurrent.CancellationException
I0315 07:30:49.752465 33660 BlockStorageLocationUtil.java:167] Cancelled while waiting for
datanode 10.17.182.22:50020: java.util.concurrent.CancellationException
{code}

Also look for "Unknown disk id count for filesystem" in the catalogd logs to see how many
missing disk ids were found in total.

This JIRA is for improving the error reporting dumped to the catalogd log when disk ids fail
to load due to DN issues. In particular, the values for the following DN configuration options
are often set pretty aggressively.
* dfs.datanode.handler.count
* dfs.client.file-block-storage-locations.timeout.millis
The logging should include the current setting of these configs and mention that increasing
the might mitigate the disk id issues on a busy cluster.

In addition, we should consider enhancing the BE "unknown disk id" warning to include possible
causes (heavy load on HDFS) and to recommend examining the catalogd logs for more information.

Note that this improvement is only relevant to Impala versions prior to IMPALA-4172 because
after that change we no longer contact the DNs for disk ids.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message