impala-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alexander Behm (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (IMPALA-5090) Improve the logging of causes for "unknown disk id" including possible workarounds
Date Fri, 17 Mar 2017 18:44:41 GMT

     [ https://issues.apache.org/jira/browse/IMPALA-5090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Alexander Behm updated IMPALA-5090:
-----------------------------------
    Component/s: Catalog

> Improve the logging of causes for "unknown disk id" including possible workarounds
> ----------------------------------------------------------------------------------
>
>                 Key: IMPALA-5090
>                 URL: https://issues.apache.org/jira/browse/IMPALA-5090
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Catalog
>    Affects Versions: Impala 2.5.0, Impala 2.4.0, Impala 2.6.0, Impala 2.7.0, Impala 2.8.0
>            Reporter: Alexander Behm
>            Priority: Critical
>              Labels: catalog-server, supportability
>
> A frequent cause of "unknown disk id" warnings during query execution is that at the
time of table loading one of the DNs holding relevant data was overloaded and could not give
a timely response to dfs.getFileBlockStorageLocations() calls from the CatalogServer.
> You will find messages similar to this in the catalogd logs at the time of table loading:
> {code}
> I0315 07:30:49.752166 33660 BlockStorageLocationUtil.java:167] Cancelled while waiting
for datanode 10.17.184.31:50020: java.util.concurrent.CancellationException
> I0315 07:30:49.752351 33660 BlockStorageLocationUtil.java:167] Cancelled while waiting
for datanode 10.17.184.32:50020: java.util.concurrent.CancellationException
> I0315 07:30:49.752465 33660 BlockStorageLocationUtil.java:167] Cancelled while waiting
for datanode 10.17.182.22:50020: java.util.concurrent.CancellationException
> {code}
> Also look for "Unknown disk id count for filesystem" in the catalogd logs to see how
many missing disk ids were found in total.
> This JIRA is for improving the error reporting dumped to the catalogd log when disk ids
fail to load due to DN issues. In particular, the values for the following DN configuration
options are often set pretty aggressively.
> * dfs.datanode.handler.count
> * dfs.client.file-block-storage-locations.timeout.millis
> The logging should include the current setting of these configs and mention that increasing
the might mitigate the disk id issues on a busy cluster.
> In addition, we should consider enhancing the BE "unknown disk id" warning to include
possible causes (heavy load on HDFS) and to recommend examining the catalogd logs for more
information.
> Note that this improvement is only relevant to Impala versions prior to IMPALA-4172 because
after that change we no longer contact the DNs for disk ids.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message