Return-Path: X-Original-To: apmail-hive-dev-archive@www.apache.org Delivered-To: apmail-hive-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2D35210B76 for ; Tue, 21 Jan 2014 23:14:27 +0000 (UTC) Received: (qmail 55050 invoked by uid 500); 21 Jan 2014 23:14:25 -0000 Delivered-To: apmail-hive-dev-archive@hive.apache.org Received: (qmail 54895 invoked by uid 500); 21 Jan 2014 23:14:25 -0000 Mailing-List: contact dev-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list dev@hive.apache.org Received: (qmail 54835 invoked by uid 500); 21 Jan 2014 23:14:24 -0000 Delivered-To: apmail-hadoop-hive-dev@hadoop.apache.org Received: (qmail 54707 invoked by uid 99); 21 Jan 2014 23:14:23 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 21 Jan 2014 23:14:23 +0000 Date: Tue, 21 Jan 2014 23:14:23 +0000 (UTC) From: "Sergey Shelukhin (JIRA)" To: hive-dev@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (HIVE-6157) Fetching column stats slower than the 101 during rush hour MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HIVE-6157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-6157: ----------------------------------- Attachment: HIVE-6157.01.patch HIVE-6157.nogen.patch first patch. There's one TODO# left where I think some validation code is dead, need to see if any tests fail with it. Other than that many tests I ran passed, let's see what HiveQA says > Fetching column stats slower than the 101 during rush hour > ---------------------------------------------------------- > > Key: HIVE-6157 > URL: https://issues.apache.org/jira/browse/HIVE-6157 > Project: Hive > Issue Type: Bug > Affects Versions: 0.13.0 > Reporter: Gunther Hagleitner > Assignee: Sergey Shelukhin > Attachments: HIVE-6157.01.patch, HIVE-6157.nogen.patch, HIVE-6157.prelim.patch > > > "hive.stats.fetch.column.stats" controls whether the column stats for a table are fetched during explain (in Tez: during query planning). On my setup (1 table 4000 partitions, 24 columns) the time spent in semantic analyze goes from ~1 second to ~66 seconds when turning the flag on. 65 seconds spent fetching column stats... > The reason is probably that the APIs force you to make separate metastore calls for each column in each partition. That's probably the first thing that has to change. The question is if in addition to that we need to cache this in the client or store the stats as a single blob in the database to further cut down on the time. However, the way it stands right now column stats seem unusable. -- This message was sent by Atlassian JIRA (v6.1.5#6160)