Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 627EF200CD3 for ; Fri, 14 Jul 2017 05:17:05 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 60D8B16D388; Fri, 14 Jul 2017 03:17:05 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id A697E16D387 for ; Fri, 14 Jul 2017 05:17:04 +0200 (CEST) Received: (qmail 52107 invoked by uid 500); 14 Jul 2017 03:17:03 -0000 Mailing-List: contact dev-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list dev@hive.apache.org Received: (qmail 52095 invoked by uid 99); 14 Jul 2017 03:17:03 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 14 Jul 2017 03:17:03 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 0826E180703 for ; Fri, 14 Jul 2017 03:17:03 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -100.002 X-Spam-Level: X-Spam-Status: No, score=-100.002 tagged_above=-999 required=6.31 tests=[RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id Znv5DJX5gI6q for ; Fri, 14 Jul 2017 03:17:01 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id 13D1D5F666 for ; Fri, 14 Jul 2017 03:17:01 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 55D98E0D55 for ; Fri, 14 Jul 2017 03:17:00 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 1023024745 for ; Fri, 14 Jul 2017 03:17:00 +0000 (UTC) Date: Fri, 14 Jul 2017 03:17:00 +0000 (UTC) From: "Sushanth Sowmyan (JIRA)" To: dev@hive.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Created] (HIVE-17095) Long chain repl loads do not complete in a timely fashion MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Fri, 14 Jul 2017 03:17:05 -0000 Sushanth Sowmyan created HIVE-17095: --------------------------------------- Summary: Long chain repl loads do not complete in a timely fashion Key: HIVE-17095 URL: https://issues.apache.org/jira/browse/HIVE-17095 Project: Hive Issue Type: Bug Components: Query Planning, repl Reporter: sapin amin Assignee: Sushanth Sowmyan Per performance testing done by [~sapinamin] (thus, I'm setting him as reporter), we were able to discover an important bug affecting replication. It has the potential to affect other large DAGs of Tasks that hive generates as well, if those DAGs have multiple paths to child Task nodes. Basically, we find that incremental REPL LOAD does not finish in a timely fashion. The test, in this case was to add 400 partitions, and replicate them. Associated with each partition, there was an ADD PTN and a ALTER PTN. For each of the ADD PTN tasks, we'd generate a DDLTask, a CopyTask and a MoveTask. For each Alter ptn, there'd be a single DDLTask. And order of execution is important, so it would chain in dependency collection tasks between phases. Trying to root cause this shows us that it seems to stall forever at the Driver instantiation time, and it almost looks like the thread doesn't proceed past that point. Looking at logs, it seems that the way this is written, it looks for all tasks generated that are subtrees of all nodes, without looking for duplicates, and this is done simply to get the number of execution tasks! And thus, the task visitor will visit every subtree of every node, which is fine if you have graphs that look like open trees, but is horrible for us, since we have dependency collection tasks between each phase. Effectively, this is what's happening: We have a DAG, say, like this: 4 tasks in parallel -> DEP col -> 4 tasks in parallel -> DEP col -> ... This means that for each of the 4 root tasks, we will do a full traversal of every graph (not just every node) past the DEP col, and this happens recursively, and this leads to an exponential growth of number of tasks visited as the length and breadth of the graph increase. In our case, we had about 800 tasks in the graph, with roughly a width of about 2-3, with 200 stages, a dep collection before and after, and this meant that leaf nodes of this DAG would have something like 2^200 - 3^200 ways in which they can be visited, and thus, we'd visit them in all those ways. And all this simply to count the number of tasks to schedule - we would revisit this function multiple more times, once per each hook, once for the MapReduceCompiler and once for the TaskCompiler. We have not been sending such large DAGs to the Driver, thus it has not yet been a problem, and there are upcoming changes to reduce the number of tasks replication generates(as part of a memory addressing issue), but we still should fix the way we do Task traversal so that a large DAG cannot cripple us. -- This message was sent by Atlassian JIRA (v6.4.14#64029)