Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id DCA69200BD4 for ; Fri, 16 Dec 2016 11:46:59 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id DB6FD160B35; Fri, 16 Dec 2016 10:46:59 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 3A4F2160AF6 for ; Fri, 16 Dec 2016 11:46:59 +0100 (CET) Received: (qmail 87139 invoked by uid 500); 16 Dec 2016 10:46:58 -0000 Mailing-List: contact issues-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list issues@hive.apache.org Received: (qmail 87107 invoked by uid 99); 16 Dec 2016 10:46:58 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 16 Dec 2016 10:46:58 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 5933A2C0086 for ; Fri, 16 Dec 2016 10:46:58 +0000 (UTC) Date: Fri, 16 Dec 2016 10:46:58 +0000 (UTC) From: "Rajesh Balamohan (JIRA)" To: issues@hive.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (HIVE-15339) Batch metastore calls to get column stats for fields needed in FilterSelectivityEstimator MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Fri, 16 Dec 2016 10:47:00 -0000 [ https://issues.apache.org/jira/browse/HIVE-15339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan updated HIVE-15339: ------------------------------------ Attachment: HIVE-15339.5.patch License header got changed accidentally in .4 patch. Attaching .5 patch . > Batch metastore calls to get column stats for fields needed in FilterSelectivityEstimator > ----------------------------------------------------------------------------------------- > > Key: HIVE-15339 > URL: https://issues.apache.org/jira/browse/HIVE-15339 > Project: Hive > Issue Type: Improvement > Reporter: Rajesh Balamohan > Priority: Minor > Attachments: HIVE-15339.1.patch, HIVE-15339.3.patch, HIVE-15339.4.patch, HIVE-15339.5.patch > > > Based on query pattern, {{FilterSelectivityEstimator}} gets column statistics from metastore in multiple calls. For instance, in the following query, it ends up getting individual column statistics for for flights multiple number of times. > When the table has large number of partitions, getting statistics for columns via multiple calls can be very expensive. This would adversely impact the overall compilation time. The following query took 14 seconds to compile. > {noformat} > SELECT COUNT(`flights`.`flightnum`) AS `cnt_flightnum_ok`, > YEAR(`flights`.`dateofflight`) AS `yr_flightdate_ok` > FROM `flights` as `flights` > JOIN `airlines` ON (`flights`.`uniquecarrier` = `airlines`.`code`) > JOIN `airports` as `source_airport` ON (`flights`.`origin` = `source_airport`.`iata`) > JOIN `airports` as `dest_airport` ON (`flights`.`dest` = `dest_airport`.`iata`) > GROUP BY YEAR(`flights`.`dateofflight`); > {noformat} > It may be helpful to club all columns that need statistics and fetch these details in single remote call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)