impala-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matthew Jacobs (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (IMPALA-5576) Wrong Cancel() in QueryState::ReportExecStatusAux() can lead to coordinator hang
Date Mon, 26 Jun 2017 17:08:00 GMT

     [ https://issues.apache.org/jira/browse/IMPALA-5576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Matthew Jacobs resolved IMPALA-5576.
------------------------------------
       Resolution: Fixed
    Fix Version/s: Impala 2.10.0

> Wrong Cancel() in QueryState::ReportExecStatusAux() can lead to coordinator hang
> --------------------------------------------------------------------------------
>
>                 Key: IMPALA-5576
>                 URL: https://issues.apache.org/jira/browse/IMPALA-5576
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Distributed Exec
>    Affects Versions: Impala 2.9.0
>            Reporter: Matthew Jacobs
>            Priority: Critical
>              Labels: hang
>             Fix For: Impala 2.10.0
>
>
> Code introduced as part of IMPALA-2550 makes a hang possible if the report exec function
fails to get a backend client. The new code cancels the local fragments but the status will
never be reported to the coordinator, so it will wait indefinitely for their reports.
> {code}
> void QueryState::ReportExecStatusAux(bool done, const Status& status,
>     FragmentInstanceState* fis, bool instances_started) {
>   // if we're reporting an error, we're done
>   DCHECK(status.ok() || done);
>   // if this is not for a specific fragment instance, we're reporting an error
>   DCHECK(fis != nullptr || !status.ok());
>   DCHECK(fis == nullptr || fis->IsPrepared());
>   // This will send a report even if we are cancelled.  If the query completed correctly
>   // but fragments still need to be cancelled (e.g. limit reached), the coordinator will
>   // be waiting for a final report and profile.
>   Status coord_status;
>   ImpalaBackendConnection coord(ExecEnv::GetInstance()->impalad_client_cache(),
>       query_ctx().coord_address, &coord_status);
>   if (!coord_status.ok()) {
>     // TODO: this might flood the log
>     LOG(WARNING) << "Couldn't get a client for " << query_ctx().coord_address
>         <<"\tReason: " << coord_status.GetDetail();
>     if (instances_started) Cancel();
>     return;
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message