impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Armstrong (Code Review)" <>
Subject [Impala-ASF-CR] IMPALA-4856: Port data stream service to KRPC
Date Mon, 11 Sep 2017 21:56:34 GMT
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4856: Port data stream service to KRPC

Patch Set 1:

(1 comment)
File be/src/runtime/krpc-data-stream-mgr.h:

Line 129: /// In exceptional circumstances, the data stream manager will garbage-collect the
There's a pre-existing flaw in the reasoning here that we should call out. "Exceptional circumstances"
is vague and I think hides a distinction between an unhealthy cluster with extreme delays
and the expected behaviour of certain long-running queries. I think the problem is an invalid
assumption that the the receiver sends batches on a regular cadence with a bounded delay before
the first batch is sent and when each subsequent batch is sent. That assumption is incorrect.
I think we should call it out in this comment so that readers understand the current flaw.
Here's an example where it's wrong.

Consider a plan with three fragments.

  F1 (long-running)
  F2 (limit = 1 on exchange)
  F3 (long-running selective scan)

1. The fragments all start up.
2. Instance 1 of F3 immediately finds and returns a matching row, which is sent to F2.
3. This causes F2 to hit its limit, close its exchange and tear itself down.
4. Let's assume F1 also has a lot of work to do and won't finish for 20 minutes
5. Instance 2 of F3 is still churning away on the scan. After 10 minutes it finally find a
matching row.
6. F3 tries to send the row, can't find the receiver after a timeout and returns an error
to the coordinator
7. The coordinator cancels the query and returns an error

There are two problems here:
1. The query failed when it shouldn't have
2. F3 wasn't cancelled when it was no longer needed and used lots of resources unnecessarily.

The JIRA is IMPALA-3990. I believe that the main reason we haven't seen this in practice is
that it can only occur when there's a limit without order in a subquery. Most queries with
that property are non-deterministic and it doesn't really make a lot of sense to have a long-running
query that returns non-deterministic results.

But this actually blocked me from implementing early-close for joins with empty build sides,
which is a nice optimisations.

There may also be a slightly different invalid assumption that the time between the receiver
closing the exchange and the sender sending its last batch is bounded. That seems possible
to solve with sender-side state if the receiver notifies the sender that the receiver was
not present and the sender can infer it was closed cleanly.

To view, visit
To unsubscribe, visit

Gerrit-MessageType: comment
Gerrit-Change-Id: Ic0b8c1e50678da66ab1547d16530f88b323ed8c1
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho <>
Gerrit-Reviewer: Tim Armstrong <>
Gerrit-HasComments: Yes

View raw message