drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Timothy Farkas (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (DRILL-6143) Make Fragment Runner's RPC Timeout a SystemOption
Date Fri, 09 Feb 2018 05:17:00 GMT

     [ https://issues.apache.org/jira/browse/DRILL-6143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Timothy Farkas updated DRILL-6143:
----------------------------------
    Summary: Make Fragment Runner's RPC Timeout a SystemOption  (was: Queries Fail Due To
Aggressive Hardcoded RPC Timeout)

> Make Fragment Runner's RPC Timeout a SystemOption
> -------------------------------------------------
>
>                 Key: DRILL-6143
>                 URL: https://issues.apache.org/jira/browse/DRILL-6143
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.13.0
>            Reporter: Timothy Farkas
>            Assignee: Timothy Farkas
>            Priority: Major
>             Fix For: 1.13.0
>
>
> Queries frequently fail sporadically on some clusters due to the following error
> {code}
> oadd.org.apache.drill.common.exceptions.UserRemoteException: CONNECTION ERROR: Exceeded
timeout (25000) while waiting send intermediate work fragments to remote nodes. Sent 5 and
only heard response back from 4 nodes.
> {code}
> This error happens because the FragmentsRunner has a hardcoded timeout RPC_WAIT_IN_MSECS_PER_FRAGMENT
which is set at 5 seconds. Increasing the timeout to 10 seconds resolved the sporadic failures
that were observed. This timeout should be changed to 10 and should also be configurable via
the SystemOptionManager



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message