impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sailesh Mukil (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-4856: Port data stream service to KRPC
Date Fri, 14 Jul 2017 22:10:54 GMT
Sailesh Mukil has uploaded a new patch set (#6).

Change subject: IMPALA-4856: Port data stream service to KRPC
......................................................................

IMPALA-4856: Port data stream service to KRPC

This patch ports the data-flow parts of ImpalaInternalService to KRPC.

* ImpalaInternalService is split into two services. The first,
  ImpalaInternalService, deals with control messages for plan fragment
  instance execution, cancellation and reporting, and remains
  implemented in Thrift for now. The second, DataStreamService, handles
  large-payload RPCs for transmitting runtime filters and row batches
  between hosts.

* In the DataStreamService, all RPCs use 'native' protobuf. The
  DataStreamService starts on the port previously reserved for the
  StatestoreSubscriberService (which is also a KRPC service), to avoid
  having to configure another port when starting Impala. When the
  ImpalaInternalService is ported to KRPC, all services will run on one
  port.

* To support needing to address two different backend services, a data
  service port has been added to TBackendDescriptor.

* This patch adds support for asynchronous RPCs to the RpcMgr and Rpc
  classes. Previously, Impala used fixed size thread pools + synchronous
  RPCs to achieve some parallelism for 'broadcast' RPCs like filter
  propagation, or a dedicated per-sender+receiver pair thread on the
  sender side in the DataStreamSender case. In this patch, the
  PublishFilter() and TransmitData() RPCs are sent asynchronously using
  KRPC's thread pools.

* The TransmitData() protocol has changed to adapt to asynchronous
  RPCs. The full details are in data-stream-mgr.h.

* As a result, DataStreamSender no longer creates a
  thread-per-connection on the sender side.

* Both tuple transmission and runtime filter publication use sidecars to
  minimise the number of copies and serialization steps required.

* Also include a fix for KUDU-2011 that properly allows sidecars to be
  shared between KRPC and the RPC caller (fixing IMPALA-5093, a
  corruption bug).

* A large portion of this patch is the replacement of TRowBatch with its
  Protobuf equivalent, RowBatchPb. The replacement is a literal port of
  the data structure, and row-batch-test, row-batch-list-test and
  row-batch-serialize-benchmark continue to execute without major logic
  changes.

* Simplify FindRecvr() logic in DataStreamManager. No-longer need to
  handle blocking sender-side, so no need for complex promise-based
  machinery. Instead, all senders with no receiver are added to a
  per-receiver list, which is processed when the receiver arrives. If it
  does not arrive promptly, the DataStreamManager cleans them up after
  FLAGS_datastream_sender_timeout_ms.

* This patch also begins a clean-up of how ImpalaServer instances are
  created (by removing CreateImpalaServer), and clarifying the
  relationship between ExecEnv and ImpalaServer. ImpalaServer now
  follows the standard construct->Init()->Start()->Join() lifecycle that
  we use for other services.

* Ensure that all addresses used for KRPCs are fully resolved, avoiding
  the need to resolve them for each RPC.

TESTING
-------

* New tests added to rpc-mgr-test.

TO DO
-----

* Re-enable throughput and latency measurements per data-stream sender
  when that information is exposed from KRPC (KUDU-1738).

* TLS and Kerberos are still not supported by KRPC in this patch.

Change-Id: Ia66704be7a0a8162bb85556d07b583ec756c584b
---
M .clang-format
M CMakeLists.txt
M be/CMakeLists.txt
M be/src/benchmarks/bloom-filter-benchmark.cc
M be/src/benchmarks/row-batch-serialize-benchmark.cc
M be/src/common/init.cc
M be/src/common/status.cc
M be/src/common/status.h
M be/src/exprs/expr-test.cc
M be/src/kudu/rpc/rpc_sidecar.cc
M be/src/kudu/rpc/rpc_sidecar.h
M be/src/rpc/CMakeLists.txt
M be/src/rpc/TAcceptQueueServer.cpp
M be/src/rpc/common.proto
M be/src/rpc/rpc-mgr-test.cc
M be/src/rpc/rpc-mgr.h
M be/src/rpc/rpc-mgr.inline.h
M be/src/rpc/rpc.h
M be/src/runtime/backend-client.h
M be/src/runtime/client-cache-types.h
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/coordinator-backend-state.h
M be/src/runtime/coordinator-filter-state.h
M be/src/runtime/coordinator.cc
M be/src/runtime/coordinator.h
M be/src/runtime/data-stream-mgr.cc
M be/src/runtime/data-stream-mgr.h
M be/src/runtime/data-stream-recvr.cc
M be/src/runtime/data-stream-recvr.h
M be/src/runtime/data-stream-sender.cc
M be/src/runtime/data-stream-sender.h
M be/src/runtime/data-stream-test.cc
M be/src/runtime/exec-env.cc
M be/src/runtime/exec-env.h
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/fragment-instance-state.h
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M be/src/runtime/row-batch-serialize-test.cc
M be/src/runtime/row-batch.cc
M be/src/runtime/row-batch.h
M be/src/runtime/runtime-filter-bank.cc
M be/src/runtime/runtime-filter-bank.h
M be/src/scheduling/backend-config-test.cc
M be/src/scheduling/backend-config.cc
M be/src/scheduling/backend-config.h
M be/src/scheduling/query-schedule.h
M be/src/scheduling/scheduler-test-util.cc
M be/src/scheduling/scheduler-test-util.h
M be/src/scheduling/scheduler.cc
M be/src/scheduling/scheduler.h
M be/src/service/CMakeLists.txt
A be/src/service/data_stream_service.proto
M be/src/service/impala-internal-service.cc
M be/src/service/impala-internal-service.h
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
M be/src/service/impalad-main.cc
M be/src/service/session-expiry-test.cc
M be/src/statestore/statestore-subscriber.cc
M be/src/statestore/statestore-test.cc
M be/src/statestore/statestore.cc
M be/src/testutil/fault-injection-util.h
M be/src/testutil/in-process-servers.cc
M be/src/testutil/in-process-servers.h
M be/src/util/bloom-filter-test.cc
M be/src/util/bloom-filter.cc
M be/src/util/bloom-filter.h
M be/src/util/hdfs-util-test.cc
M be/src/util/network-util.cc
M be/src/util/network-util.h
M be/src/util/thread-pool.h
M bin/impala-config.sh
M common/thrift/CMakeLists.txt
M common/thrift/ImpalaInternalService.thrift
M common/thrift/Results.thrift
M common/thrift/StatestoreService.thrift
M common/thrift/generate_error_codes.py
M tests/custom_cluster/test_breakpad.py
M tests/custom_cluster/test_exchange_delays.py
M tests/custom_cluster/test_rpc_timeout.py
M tests/query_test/test_runtime_filters.py
M tests/stress/test_mini_stress.py
83 files changed, 2,793 insertions(+), 1,980 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/03/7103/6
-- 
To view, visit http://gerrit.cloudera.org:8080/7103
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia66704be7a0a8162bb85556d07b583ec756c584b
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson <henry@cloudera.com>
Gerrit-Reviewer: Henry Robinson <henry@cloudera.com>
Gerrit-Reviewer: Michael Ho <kwho@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sailesh@cloudera.com>

Mime
View raw message