impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sailesh Mukil (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-5749: coordinator race hits DCHECK 'num remaining backends > 0'
Date Thu, 03 Aug 2017 20:14:56 GMT
Sailesh Mukil has posted comments on this change.

Change subject: IMPALA-5749: coordinator race hits DCHECK 'num_remaining_backends_ > 0'
......................................................................


Patch Set 1:

> Does this trigger only when there are two concurrent calls to
 > UpdateBackendExecStatus() from the same backend? If so, do we
 > understand why that happens so often?

My understanding is this:
A fragment instance sends reports every 'n' seconds. Due to a congested network, two of these
reports for the same fragment instance from a backend can arrive at the coordinator and start
being processed at around the same time, hence leading to this issue.

Ideally a second report cannot be send until the first one is ACKd by the coordinator, since
a lock is held until the report is ACKd, in the ReportProfileThread(); but there is only one
case where a second report will be sent before the first one is responded to, i.e.  from FragmentInstanceState::Finalize().

So ReportProfileThread() sends the one report of the last finstance, then Finalize() sends
the second report of the same finstance before the first one is responded to.

-- 
To view, visit http://gerrit.cloudera.org:8080/7577
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I1528661e5df6d9732ebfeb414576c82ec5c92241
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall <tmarshall@cloudera.com>
Gerrit-Reviewer: Henry Robinson <henry@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sailesh@cloudera.com>
Gerrit-HasComments: No

Mime
View raw message