Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 2907C200C83 for ; Sun, 28 May 2017 22:51:09 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 27833160BAF; Sun, 28 May 2017 20:51:09 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 6DB3D160BCC for ; Sun, 28 May 2017 22:51:08 +0200 (CEST) Received: (qmail 95257 invoked by uid 500); 28 May 2017 20:51:07 -0000 Mailing-List: contact issues-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list issues@hive.apache.org Received: (qmail 95248 invoked by uid 99); 28 May 2017 20:51:07 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 28 May 2017 20:51:07 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 1BF3FC030F for ; Sun, 28 May 2017 20:51:07 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -99.202 X-Spam-Level: X-Spam-Status: No, score=-99.202 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id e4fR_LvuxyXd for ; Sun, 28 May 2017 20:51:06 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id 4CDF15FB8F for ; Sun, 28 May 2017 20:51:05 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 8B805E0D3C for ; Sun, 28 May 2017 20:51:04 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 2056C21B59 for ; Sun, 28 May 2017 20:51:04 +0000 (UTC) Date: Sun, 28 May 2017 20:51:04 +0000 (UTC) From: "Remus Rusanu (JIRA)" To: issues@hive.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (HIVE-16757) Use of deprecated {{getRows()}} instead of new {{estimateRowCount(RelMetadataQuery..)}} has serious performance impact MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Sun, 28 May 2017 20:51:09 -0000 [ https://issues.apache.org/jira/browse/HIVE-16757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Remus Rusanu updated HIVE-16757: -------------------------------- Summary: Use of deprecated {{getRows()}} instead of new {{estimateRowCount(RelMetadataQuery..)}} has serious performance impact (was: Use memoization in HiveRelMdRowCount.getRowCount) > Use of deprecated {{getRows()}} instead of new {{estimateRowCount(RelMetadataQuery..)}} has serious performance impact > ---------------------------------------------------------------------------------------------------------------------- > > Key: HIVE-16757 > URL: https://issues.apache.org/jira/browse/HIVE-16757 > Project: Hive > Issue Type: Bug > Components: Query Planning > Reporter: Remus Rusanu > Assignee: Remus Rusanu > Attachments: HIVE-16757.01.patch, HIVE-16757.02.patch > > > Calling Calcite's {{RelMetadataQuery.instance()}} is very expensive because it places a new memoization cache on the stack. Hidden in the deperecated {{AbstractRelNode.getRows()}} call is a call to {{instance()}}. In hive we have a number of places where we're calling the deprecated {{getRows()}} instead of the new API {{estimateRowCount(RelMetadataQuery mq)}} which accepts the RelMetadataQuery, which most places we actually have it handy to pass. On looking at the a complex query (49 joins) there are 2995340 calls to {{AbstractRelNode.getRows}}, each one busting the current memoization cache away. > Was: -On complex queries HiveRelMdRowCount.getRowCount can get called many times. since it does not memoize its result and the call is recursive, it results in an explosion of calls. for example a query with 49 joins, during join ordering (LoptOtimizerJoinRule) the HiveRelMdRowCount.getRowCount gets called 6442 as a top level call, but the recursivity exploded this to 501729 calls. Memoization of the rezult would stop the recursion early. In my testing this reduced the join reordering time for said query from 11s to <1s..- > Note there is no need for {{HiveRelMdRowCount}} memoization because the function is called in stacks similar to this: > {code} > at org.apache.hadoop.hive.ql.optimizer.calcite.stats.HiveRelMdRowCount.getRowCount(HiveRelMdRowCount.java:66) > at GeneratedMetadataHandler_RowCount.getRowCount_$ > at GeneratedMetadataHandler_RowCount.getRowCount > at org.apache.calcite.rel.metadata.RelMetadataQuery.getRowCount(RelMetadataQuery.java:204) > at org.apache.calcite.rel.rules.LoptOptimizeJoinRule.swapInputs(LoptOptimizeJoinRule.java:1865) > at org.apache.calcite.rel.rules.LoptOptimizeJoinRule.createJoinSubtree(LoptOptimizeJoinRule.java:1739) > {code} > and {{GeneratedMetadataHandler_RowCount.getRowCount}} handles memoization. -- This message was sent by Atlassian JIRA (v6.3.15#6346)