Return-Path: X-Original-To: apmail-drill-dev-archive@www.apache.org Delivered-To: apmail-drill-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 9D1CF17B26 for ; Wed, 11 Feb 2015 17:55:10 +0000 (UTC) Received: (qmail 21333 invoked by uid 500); 11 Feb 2015 17:55:04 -0000 Delivered-To: apmail-drill-dev-archive@drill.apache.org Received: (qmail 21274 invoked by uid 500); 11 Feb 2015 17:55:04 -0000 Mailing-List: contact dev-help@drill.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@drill.apache.org Delivered-To: mailing list dev@drill.apache.org Received: (qmail 21261 invoked by uid 500); 11 Feb 2015 17:55:04 -0000 Delivered-To: apmail-incubator-drill-dev@incubator.apache.org Received: (qmail 21256 invoked by uid 99); 11 Feb 2015 17:55:04 -0000 Received: from reviews-vm.apache.org (HELO reviews.apache.org) (140.211.11.40) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 11 Feb 2015 17:55:04 +0000 Received: from reviews.apache.org (localhost [127.0.0.1]) by reviews.apache.org (Postfix) with ESMTP id 439761CF8C3; Wed, 11 Feb 2015 17:55:03 +0000 (UTC) Content-Type: multipart/alternative; boundary="===============8137678126864275391==" MIME-Version: 1.0 Subject: Re: Review Request 28417: DRILL-1742 Use Hive stats when planning queries on Hive data sources From: "abdelhakim deneche" To: "Aman Sinha" , "abdelhakim deneche" , "drill" Date: Wed, 11 Feb 2015 17:55:03 -0000 Message-ID: <20150211175503.29076.22859@reviews.apache.org> X-ReviewBoard-URL: https://reviews.apache.org/ Auto-Submitted: auto-generated Sender: "abdelhakim deneche" X-ReviewGroup: drill-git X-ReviewRequest-URL: https://reviews.apache.org/r/28417/ X-Sender: "abdelhakim deneche" References: <20141125012318.15977.38530@reviews.apache.org> In-Reply-To: <20141125012318.15977.38530@reviews.apache.org> Reply-To: "abdelhakim deneche" X-ReviewRequest-Repository: drill-git --===============8137678126864275391== MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit > On Nov. 25, 2014, 1:23 a.m., Aman Sinha wrote: > > contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveScan.java, lines 298-300 > > > > > > This is not necessarily true; if you have empty tables, the rowcount will be 0. So you need to distinguish between the case where the stats are not available (maybe use -1 as an indicator) from the case where it is available and has 0 rowcount. > > abdelhakim deneche wrote: > The problem is that when numRows=0 in the stats can actually mean the stats have not been computed yet! so we still need to estimate the row count using the size of the input splits. > I made some tests using empty tables, and the estimated row count for those tables is 0 too, so it's correct. in hive 0.13 numRows will contain -1 when the stats were never computed. - abdelhakim ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/28417/#review62916 ----------------------------------------------------------- On Nov. 25, 2014, 7:46 p.m., abdelhakim deneche wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/28417/ > ----------------------------------------------------------- > > (Updated Nov. 25, 2014, 7:46 p.m.) > > > Review request for drill. > > > Bugs: DRILL-1742 > https://issues.apache.org/jira/browse/DRILL-1742 > > > Repository: drill-git > > > Description > ------- > > HiveScan.getSplits() already gets the table and partitions metadata using MetaStoreUtils. > We compute the total number of rows using the numRows property and store the computed number of rows in rowCount attribute which is later returned by getScanStats(). > > > Diffs > ----- > > contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveScan.java ddbc100 > > Diff: https://reviews.apache.org/r/28417/diff/ > > > Testing > ------- > > created several partitioned and non-partitioned tables, loaded data in hive. > > used explain plan to check the number of rows when the whole table is queried and also when specific partitions are queried (to make sure the row count takes hive partition pruning into account) > > > Thanks, > > abdelhakim deneche > > --===============8137678126864275391==--