Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 65031200CF3 for ; Wed, 30 Aug 2017 02:52:07 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 63DBB167ED3; Wed, 30 Aug 2017 00:52:07 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 84A78167ECF for ; Wed, 30 Aug 2017 02:52:06 +0200 (CEST) Received: (qmail 49413 invoked by uid 500); 30 Aug 2017 00:52:05 -0000 Mailing-List: contact issues-help@drill.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@drill.apache.org Delivered-To: mailing list issues@drill.apache.org Received: (qmail 49404 invoked by uid 99); 30 Aug 2017 00:52:05 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 30 Aug 2017 00:52:05 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 4C2ED188E72 for ; Wed, 30 Aug 2017 00:52:05 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -99.202 X-Spam-Level: X-Spam-Status: No, score=-99.202 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id lKg2IZwDrUGz for ; Wed, 30 Aug 2017 00:52:04 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id 70DBC610F0 for ; Wed, 30 Aug 2017 00:52:03 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id A26BBE0EC6 for ; Wed, 30 Aug 2017 00:52:01 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id A24182416E for ; Wed, 30 Aug 2017 00:52:00 +0000 (UTC) Date: Wed, 30 Aug 2017 00:52:00 +0000 (UTC) From: "ASF GitHub Bot (JIRA)" To: issues@drill.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (DRILL-5721) Query with only root fragment and no non-root fragment hangs when Drillbit to Drillbit Control Connection has network issues MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Wed, 30 Aug 2017 00:52:07 -0000 [ https://issues.apache.org/jira/browse/DRILL-5721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16146417#comment-16146417 ] ASF GitHub Bot commented on DRILL-5721: --------------------------------------- Github user parthchandra commented on a diff in the pull request: https://github.com/apache/drill/pull/919#discussion_r135937433 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/work/foreman/Foreman.java --- @@ -1073,26 +1070,22 @@ public QueryId getQueryId() { */ private void setupRootFragment(final PlanFragment rootFragment, final FragmentRoot rootOperator) throws ExecutionSetupException { - @SuppressWarnings("resource") final FragmentContext rootContext = new FragmentContext(drillbitContext, rootFragment, queryContext, initiatingClient, drillbitContext.getFunctionImplementationRegistry()); - @SuppressWarnings("resource") - final IncomingBuffers buffers = new IncomingBuffers(rootFragment, rootContext); - rootContext.setBuffers(buffers); - - queryManager.addFragmentStatusTracker(rootFragment, true); - final ControlTunnel tunnel = drillbitContext.getController().getTunnel(queryContext.getCurrentEndpoint()); + final FragmentStatusReporter statusReporter = new FragmentStatusReporter(rootContext, tunnel); final FragmentExecutor rootRunner = new FragmentExecutor(rootContext, rootFragment, - new FragmentStatusReporter(rootContext, tunnel), - rootOperator); - final RootFragmentManager fragmentManager = new RootFragmentManager(rootFragment.getHandle(), buffers, rootRunner); + statusReporter, rootOperator); - if (buffers.isDone()) { + queryManager.addFragmentStatusTracker(rootFragment, true); + + // FragmentManager is setting buffer for FragmentContext + if (rootContext.isBuffersDone()) { --- End diff -- I don't see where rootContext.setBuffers is being called to set the buffers > Query with only root fragment and no non-root fragment hangs when Drillbit to Drillbit Control Connection has network issues > ---------------------------------------------------------------------------------------------------------------------------- > > Key: DRILL-5721 > URL: https://issues.apache.org/jira/browse/DRILL-5721 > Project: Apache Drill > Issue Type: Bug > Reporter: Sorabh Hamirwasia > Assignee: Sorabh Hamirwasia > Fix For: 1.12.0 > > > Recently I found an issue (Thanks to [~knguyen] to create this scenario) related to Fragment Status reporting and would like some feedback on it. > When a client submits a query to Foreman, then it is planned by Foreman and later fragments are scheduled to root and non-root nodes. Foreman creates a DriilbitStatusListener and FragmentStatusListener to know about the health of Drillbit node and a fragment respectively. The way root and non-root fragments are setup by Foreman are different: > Root fragments are setup without any communication over control channel (since it is executed locally on Foreman) > Non-root fragments are setup by sending control message (REQ_INITIALIZE_FRAGMENTS_VALUE) over wire. If there is failure in sending any such control message (like due to network hiccup's) during query setup then the query is failed and client is notified. > Each fragment is executed on it's node with the help Fragment Executor which has an instance for FragmentStatusReporter. FragmentStatusReporter helps to update the status of a fragment to Foreman node over a control tunnel or connection using RPC message (REQ_FRAGMENT_STATUS) both for root and non-root fragments. > Based on above when root fragment is submitted for setup then it is done locally without any RPC communication whereas when status for that fragment is reported by fragment executor that happens over control connection by sending a RPC message. But for non-root fragment setup and status update both happens using RPC message over control connection. > *Issue 1:* > What was observed is if for a simple query which has only 1 root fragment running on Foreman node then setup will work fine. But as part of status update when the fragment tries to create a control connection and fails to establish that, then the query hangs. This is because the root fragment will complete execution but will fail to update Foreman about it and Foreman think that the query is running for ever. > *Proposed Solution:* > For root fragment the setup of fragment is happening locally without RPC message, so we can do the same for status update of root fragments. This will avoid RPC communication for status update of fragments running locally on the foreman and hence will resolve issue 1. -- This message was sent by Atlassian JIRA (v6.4.14#64029)