impala-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matthew Jacobs (JIRA)" <j...@apache.org>
Subject [jira] [Created] (IMPALA-5576) Wrong Cancel() in QueryState::ReportExecStatusAux() can lead to coordinator hang
Date Sat, 24 Jun 2017 18:28:00 GMT
Matthew Jacobs created IMPALA-5576:
--------------------------------------

             Summary: Wrong Cancel() in QueryState::ReportExecStatusAux() can lead to coordinator
hang
                 Key: IMPALA-5576
                 URL: https://issues.apache.org/jira/browse/IMPALA-5576
             Project: IMPALA
          Issue Type: Bug
          Components: Distributed Exec
    Affects Versions: Impala 2.9.0
            Reporter: Matthew Jacobs
            Priority: Critical


Code introduced as part of IMPALA-2550 makes a hang possible if the report exec function fails
to get a backend client. The new code cancels the local fragments but the status will never
be reported to the coordinator, so it will wait indefinitely for their reports.

{code}
void QueryState::ReportExecStatusAux(bool done, const Status& status,
    FragmentInstanceState* fis, bool instances_started) {
  // if we're reporting an error, we're done
  DCHECK(status.ok() || done);
  // if this is not for a specific fragment instance, we're reporting an error
  DCHECK(fis != nullptr || !status.ok());
  DCHECK(fis == nullptr || fis->IsPrepared());

  // This will send a report even if we are cancelled.  If the query completed correctly
  // but fragments still need to be cancelled (e.g. limit reached), the coordinator will
  // be waiting for a final report and profile.

  Status coord_status;
  ImpalaBackendConnection coord(ExecEnv::GetInstance()->impalad_client_cache(),
      query_ctx().coord_address, &coord_status);
  if (!coord_status.ok()) {
    // TODO: this might flood the log
    LOG(WARNING) << "Couldn't get a client for " << query_ctx().coord_address
        <<"\tReason: " << coord_status.GetDetail();
    if (instances_started) Cancel();
    return;
  }
{code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message