impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael Ho (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-4856: Port data stream service to KRPC
Date Sat, 04 Nov 2017 04:00:00 GMT
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/8023 )

Change subject: IMPALA-4856: Port data stream service to KRPC
......................................................................


Patch Set 7:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/8023/7/be/src/runtime/krpc-data-stream-mgr.cc
File be/src/runtime/krpc-data-stream-mgr.cc:

http://gerrit.cloudera.org:8080/#/c/8023/7/be/src/runtime/krpc-data-stream-mgr.cc@199
PS7, Line 199:   DeserializeTask payload =
             :       {DeserializeTaskType::EARLY_SENDERS, finst_id, dest_node_id, 0};
             :   deserialize_pool_.Offer(move(payload));
> doesn't this mean we make early sender draining single threaded? shoudl we 
Yes. Actually, I just realized that early senders may also be passed by incoming row batches
before they are deserialized by the deserialization thread pool, leading to extended response
time for early senders.

A simpler scheme would be to actually put the early senders into the 'deferred_batches' queue
of the corresponding sender's queue so new incoming row batches cannot pass it.

Regarding the parallelism for draining the deferred_batches queue, one simple thing to do
is to enqueue as many deserialization requests as there are entries in the 'deferred_batch'
queue. The deserialization thread logic will simply peek the first entry of the queue and
try to insert it if there is space. An entry is popped off the queue only if it can be inserted.
This may be wasteful if the 'batch_queue' fills up before all deserialization thread requests
are drained but hopefully the peeking logic shouldn't take too long. We can be more fancy
and record the deserialized size of each entry in deferred_batches_ and determine how many
entries we can pop off deferred_batches_ queue without going over the limit.


http://gerrit.cloudera.org:8080/#/c/8023/7/be/src/runtime/krpc-data-stream-mgr.cc@235
PS7, Line 235:     for (const unique_ptr<TransmitDataCtx>& ctx : early_senders.waiting_sender_ctxs)
{
> shouldn't we process waiting_sender_ctxs before closed_sender_ctxs? Otherwi
Yes, it's impossible for the same sender in both queues at the same time but yeah, I can switch
the order if it's easier to understand.



-- 
To view, visit http://gerrit.cloudera.org:8080/8023
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic0b8c1e50678da66ab1547d16530f88b323ed8c1
Gerrit-Change-Number: 8023
Gerrit-PatchSet: 7
Gerrit-Owner: Michael Ho <kwho@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhecht@cloudera.com>
Gerrit-Reviewer: Michael Ho <kwho@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mmokhtar@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sailesh@cloudera.com>
Gerrit-Comment-Date: Sat, 04 Nov 2017 04:00:00 +0000
Gerrit-HasComments: Yes

Mime
  • Unnamed multipart/alternative (inline, 8-Bit, 0 bytes)
View raw message