Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 1498F200C3A for ; Fri, 3 Mar 2017 00:41:55 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 131FB160B7A; Thu, 2 Mar 2017 23:41:55 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 8321A160B6F for ; Fri, 3 Mar 2017 00:41:54 +0100 (CET) Received: (qmail 75586 invoked by uid 500); 2 Mar 2017 23:41:53 -0000 Mailing-List: contact issues-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list issues@hive.apache.org Received: (qmail 75577 invoked by uid 99); 2 Mar 2017 23:41:53 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 02 Mar 2017 23:41:53 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 611CEC2138 for ; Thu, 2 Mar 2017 23:41:53 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.451 X-Spam-Level: * X-Spam-Status: No, score=1.451 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RP_MATCHES_RCVD=-0.001, SPF_NEUTRAL=0.652] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id o_EVDxHM3O4n for ; Thu, 2 Mar 2017 23:41:52 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id 826345FDD4 for ; Thu, 2 Mar 2017 23:41:52 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 239D4E08C3 for ; Thu, 2 Mar 2017 23:41:46 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 8B7102416C for ; Thu, 2 Mar 2017 23:41:45 +0000 (UTC) Date: Thu, 2 Mar 2017 23:41:45 +0000 (UTC) From: "Siddharth Seth (JIRA)" To: issues@hive.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (HIVE-16094) queued containers may timeout if they don't get to run for a long time MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Thu, 02 Mar 2017 23:41:55 -0000 [ https://issues.apache.org/jira/browse/HIVE-16094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-16094: ---------------------------------- Attachment: HIVE-16094.01.patch The problem was that if an am was picked up by the queueDrainer when it had 0 fragments, it would not be put back. registerFragment would only add a new entry to the queue if the am was not known. AMNodeInfo instances were originally meant to be used across multiple queries belonging to an AM. We could still achieve that by going back to the old model of reference counting. However, I think it's cleaner to maintain an AMNodeInfo instance per query instance. So - the patch changes the key to be the queryIdentifier. An instance of amNodeInfo is always maintained in the queue. A heartbeat is only sent if there are pending fragments. It is removed from the queue after query completion, or if an error is hit. cc [~prasanth_j] for review. > queued containers may timeout if they don't get to run for a long time > ---------------------------------------------------------------------- > > Key: HIVE-16094 > URL: https://issues.apache.org/jira/browse/HIVE-16094 > Project: Hive > Issue Type: Bug > Affects Versions: 2.2.0 > Reporter: Siddharth Seth > Assignee: Siddharth Seth > Priority: Critical > Attachments: HIVE-16094.01.patch > > > I believe this happened after HIVE-15958 - since we end up keeping amNodeInfo in knownAppMaters, and that can result in the callable not being scheduled on new task registration. -- This message was sent by Atlassian JIRA (v6.3.15#6346)