Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 2D6BA200BE2 for ; Thu, 15 Dec 2016 23:35:43 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 2BEBF160B15; Thu, 15 Dec 2016 22:35:43 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 723ED160B13 for ; Thu, 15 Dec 2016 23:35:42 +0100 (CET) Received: (qmail 29643 invoked by uid 500); 15 Dec 2016 22:35:41 -0000 Mailing-List: contact reviews-help@impala.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list reviews@impala.incubator.apache.org Received: (qmail 29632 invoked by uid 99); 15 Dec 2016 22:35:41 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 15 Dec 2016 22:35:41 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id E147C1A7AD7 for ; Thu, 15 Dec 2016 22:35:40 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.363 X-Spam-Level: X-Spam-Status: No, score=0.363 tagged_above=-999 required=6.31 tests=[RDNS_DYNAMIC=0.363, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id 1JWYrSDB_to5 for ; Thu, 15 Dec 2016 22:35:38 +0000 (UTC) Received: from ip-10-146-233-104.ec2.internal (ec2-75-101-130-251.compute-1.amazonaws.com [75.101.130.251]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 292B65F5F8 for ; Thu, 15 Dec 2016 22:35:38 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by ip-10-146-233-104.ec2.internal (8.14.4/8.14.4) with ESMTP id uBFMZ5TZ012490; Thu, 15 Dec 2016 22:35:06 GMT Message-Id: <201612152235.uBFMZ5TZ012490@ip-10-146-233-104.ec2.internal> Date: Thu, 15 Dec 2016 22:35:05 +0000 From: "Henry Robinson (Code Review)" To: Sailesh Mukil , impala-cr@cloudera.com, reviews@impala.incubator.apache.org CC: Lars Volker Reply-To: henry@cloudera.com X-Gerrit-MessageType: comment Subject: =?UTF-8?Q?=5BImpala-ASF-CR=5D_IMPALA-3977=3A_TransmitData=28=29_should_not_block=0A?= X-Gerrit-Change-Id: I47f9d6c1de95a72ab73471dcd2a3118ef13e7db0 X-Gerrit-ChangeURL: X-Gerrit-Commit: 12731dad2dcecf0b573f97b62a03a840841eba2a In-Reply-To: References: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Content-Disposition: inline User-Agent: Gerrit/2.12.2 archived-at: Thu, 15 Dec 2016 22:35:43 -0000 Henry Robinson has posted comments on this change. Change subject: IMPALA-3977: TransmitData() should not block ...................................................................... Patch Set 1: (3 comments) http://gerrit.cloudera.org:8080/#/c/5491/1/be/src/runtime/data-stream-mgr.cc File be/src/runtime/data-stream-mgr.cc: PS1, Line 53: STREAM_EXPIRATION_TIME_MS > I've thought about this for a while and it makes sense that the cache is no I think we can get the same effect as the stream cache without the complexity. The key is distinguishing between all the different states: If a receiver is not present, return EAGAIN. Either a) this is the first batch, so try again or b) we've already successfully sent > 0 batches, so the sender knows the receiver has gone away. The case you describe, where there's an upstream limit, could lead to a sender retrying if it never contributed a batch to the limit. But that's not very common. If we have concerns about the network usage of retrying, we can always fall back to probing without a payload. If the receiver is present, but the queue is full, return a different status code to distinguish from the case where the receiver is not present. This allows us to fail fast in the former case where the receiver has been torn down but we've already sent a batch, so we know it was once there. http://gerrit.cloudera.org:8080/#/c/5491/1/be/src/runtime/data-stream-sender.cc File be/src/runtime/data-stream-sender.cc: PS1, Line 209: TTransmitDataResult res; > TTransmitDataResult has only one field, i.e. Status. Don't rely on TTransmitDataResult not changing. There's no reason L224+ have to be outside the loop. PS1, Line 217: DoTransmitDataRpc > I like the idea, but isn't that optimizing for a worse-case scenario? Keepi Yes, I think the sender needs to time-out of AddBatch(). Otherwise it could get blocked in the queue and hold a service pool thread. I'm ok with waiting to see if the probe request has a benefit, but I still think we'll need to distinguish the first RPC from all subsequent ones to distinguish between "receiver is not there yet" from "receiver was here and has disappeared" conditions (see other comments for details). I'd expect the probe to take a few ms at most, and the network capacity for those should be effectively infinite given our requirements. The capacity for full batch sends might be smaller, which affects the latency - I'm not sure we can say that throughput is not an issue. Anyhow, you can defer that change for now. -- To view, visit http://gerrit.cloudera.org:8080/5491 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I47f9d6c1de95a72ab73471dcd2a3118ef13e7db0 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Sailesh Mukil Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Sailesh Mukil Gerrit-HasComments: Yes