impala-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sailesh Mukil (Code Review)" <ger...@cloudera.org>
Subject [Impala-CR](cdh5-2.5.0_5.7.0) IMPALA-2592: DataStreamSender::Channel::CloseInternal() does not close the channel on an error.
Date Wed, 02 Mar 2016 00:08:03 GMT
Hello Silvius Rus, Dan Hecht,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/2205

to look at the new patch set (#11).

Change subject: IMPALA-2592: DataStreamSender::Channel::CloseInternal() does not close the
channel on an error.
......................................................................

IMPALA-2592: DataStreamSender::Channel::CloseInternal() does not close the channel on an error.

CloseInternal() did not send an EOS RPC if there was an error in
sending the previous RPC or an error in flushing the current row
batch. It instead just logged the error. Due to this, the receiver
on the RPC channel remained open causing the query to hang in the
cluster.

This fix adds a new function called FlushFinal() to the class
DataSink. All the classes that inherit from DataSink
(DataStreamSender included) implement this function. It currently is
a no-op in the classes other than DataStreamSender; this will be
changed in following patches.

FlushFinal() can return a status and all the function calls that could
fail in DataStreamSender::Close() have been moved to FlushFinal()
leaving Close() as a pure teardown function which cannot fail.

Due to FlushFinal() being able to return an error status, the errors
are sent to the PlanFragmentExecutor which sends these errors in a
report via ReportStatusCb() to the coordinator. The coordinator then
cancels the query if it receives a report with an error in it. This
is how the fix helps in getting rid of the cluster hang.

Furthermore, a retry loop is added to send an RPC in
FragmentExecState::ReportStatusCb() in case the coordinator is
unreachable. If this RPC fails to be sent, there is a chance that the
cluster could hang. This will be addressed in a seperate patch.

Change-Id: Ia45d45dedb21c8b8abab266e140766eddfa363fb
---
M be/src/exec/data-sink.h
M be/src/exec/hbase-table-sink.cc
M be/src/exec/hbase-table-sink.h
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M be/src/runtime/data-stream-sender.cc
M be/src/runtime/data-stream-sender.h
M be/src/runtime/data-stream-test.cc
M be/src/runtime/plan-fragment-executor.cc
M be/src/service/fragment-exec-state.cc
10 files changed, 105 insertions(+), 38 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/05/2205/11
-- 
To view, visit http://gerrit.cloudera.org:8080/2205
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia45d45dedb21c8b8abab266e140766eddfa363fb
Gerrit-PatchSet: 11
Gerrit-Project: Impala
Gerrit-Branch: cdh5-2.5.0_5.7.0
Gerrit-Owner: Sailesh Mukil <sailesh@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhecht@cloudera.com>
Gerrit-Reviewer: Henry Robinson <henry@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sailesh@cloudera.com>
Gerrit-Reviewer: Silvius Rus <srus@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstrong@cloudera.com>

Mime
View raw message