tez-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From László Bodor (Jira) <j...@apache.org>
Subject [jira] [Updated] (TEZ-4097) Report localHostname in Fetcher and FetcherOrderedGrouped failure log messages
Date Tue, 05 Nov 2019 14:02:00 GMT

     [ https://issues.apache.org/jira/browse/TEZ-4097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

László Bodor updated TEZ-4097:
------------------------------
    Summary: Report localHostname in Fetcher and FetcherOrderedGrouped failure log messages
 (was: Report localHostname in Fetcher failure log messages)

> Report localHostname in Fetcher and FetcherOrderedGrouped failure log messages
> ------------------------------------------------------------------------------
>
>                 Key: TEZ-4097
>                 URL: https://issues.apache.org/jira/browse/TEZ-4097
>             Project: Apache Tez
>          Issue Type: Improvement
>            Reporter: László Bodor
>            Assignee: László Bodor
>            Priority: Minor
>
> Currently, a fetch failure is reported like this:
> {code}
> 2019-11-05 02:50:35,972 [WARN] [Fetcher_B {Map_4} #1] |shuffle.Fetcher|: Fetch Failure
from host while connecting: other_host, attempt: InputAttemptIdentifier [inputIdentifier=1,
attemptNumber=0, pathComponent=attempt_1572936153637_0005_1_00_000000_0_10003, spillType=0,
spillId=-1] Informing ShuffleManager:
> java.net.SocketTimeoutException: Read timed out
> ...
> {code}
> For debugging network/ssl/etc. issues on cluster, it would be convenient to see the local
host's name in these messages (which is present in the fetcher as localHostname property),
as in the logs collected by yarn cli, it's not obvious for the first sight.
> The same applies to FetcherOrderedGrouped, which reports something like:
> {code}
> 2019-11-05 03:13:11,046 [WARN] [Fetcher_O {Map_1} #0] |orderedgrouped.FetcherOrderedGrouped|:
Failed to verify reply after connecting to other_host:13562 with 1 inputs pending
> javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX
path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to
find valid certification path to requested target
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Mime
View raw message