Return-Path: X-Original-To: apmail-hive-dev-archive@www.apache.org Delivered-To: apmail-hive-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 00466106FF for ; Mon, 18 Nov 2013 17:31:27 +0000 (UTC) Received: (qmail 11609 invoked by uid 500); 18 Nov 2013 17:31:25 -0000 Delivered-To: apmail-hive-dev-archive@hive.apache.org Received: (qmail 11536 invoked by uid 500); 18 Nov 2013 17:31:25 -0000 Mailing-List: contact dev-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list dev@hive.apache.org Received: (qmail 11383 invoked by uid 500); 18 Nov 2013 17:31:23 -0000 Delivered-To: apmail-hadoop-hive-dev@hadoop.apache.org Received: (qmail 11278 invoked by uid 99); 18 Nov 2013 17:31:22 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 18 Nov 2013 17:31:22 +0000 Date: Mon, 18 Nov 2013 17:31:22 +0000 (UTC) From: "Prasanth J (JIRA)" To: hive-dev@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (HIVE-5369) Annotate hive operator tree with statistics from metastore MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HIVE-5369?page=3Dcom.atlassian= .jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth J updated HIVE-5369: ----------------------------- Attachment: HIVE-5369.10.patch Fixed the failing test which was recently added. > Annotate hive operator tree with statistics from metastore > ---------------------------------------------------------- > > Key: HIVE-5369 > URL: https://issues.apache.org/jira/browse/HIVE-5369 > Project: Hive > Issue Type: New Feature > Components: Query Processor, Statistics > Affects Versions: 0.13.0 > Reporter: Prasanth J > Assignee: Prasanth J > Labels: statistics > Fix For: 0.13.0 > > Attachments: HIVE-5369.1.txt, HIVE-5369.10.patch, HIVE-5369.2.WIP= .txt, HIVE-5369.2.patch.txt, HIVE-5369.3.patch.txt, HIVE-5369.4.patch.txt, = HIVE-5369.5.patch.txt, HIVE-5369.6.patch.txt, HIVE-5369.7.patch.txt, HIVE-5= 369.8.patch.txt, HIVE-5369.9.patch, HIVE-5369.9.patch.txt, HIVE-5369.WIP.tx= t, HIVE-5369.refactor.WIP.txt > > > Currently the statistics gathered at table/partition level and column lev= el are not used during query planning stage. Statistics at table/partition = and column level can be used for optimizing the query plans. Basic statisti= cs like uncompressed data size can be used for better reducer estimation. O= ther statistics like number of rows, distinct values of columns, average le= ngth of columns etc. can be used by Cost Based Optimizer (CBO) for making b= etter query plan selection. As a first step in improving query planning the= statistics that are available in the metastore should be attached to hive = operator tree. The operator tree should be walked and annotated with statis= tics information. The attached statistics will vary for each operator depen= ding on the operation it performs. For example, select operator will change= the average row size but doesn't affect the number of rows. Similarly filt= er operator will change the number of rows but doesn't change the average r= ow size. Similar rules can be applied for other operators as well.=20 > Rules for different operators are added as comments in the code. For more= detailed information, the reference book that I am using is "Database Syst= ems: The Complete Book" by Garcia-Molina et.al. -- This message was sent by Atlassian JIRA (v6.1#6144)