drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aman Sinha" <asi...@maprtech.com>
Subject Re: Review Request 28417: DRILL-1742 Use Hive stats when planning queries on Hive data sources
Date Tue, 25 Nov 2014 01:23:18 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28417/#review62916
-----------------------------------------------------------



contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveScan.java
<https://reviews.apache.org/r/28417/#comment105085>

    I still think this needs to be initialized and not depend on getSplits() since obviously
after your latest changes, the rowCount property is not assumed to be available.  Also, see
my later comment about distinguishing between an empty table (0 rowcount) and one where stats
is not available.



contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveScan.java
<https://reviews.apache.org/r/28417/#comment105084>

    This is not necessarily true;  if you have empty tables, the rowcount will be 0. So you
need to distinguish between the case where the stats are not available (maybe use -1 as an
indicator) from the case where it is available and has 0 rowcount.


- Aman Sinha


On Nov. 25, 2014, 12:56 a.m., abdelhakim deneche wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/28417/
> -----------------------------------------------------------
> 
> (Updated Nov. 25, 2014, 12:56 a.m.)
> 
> 
> Review request for drill.
> 
> 
> Bugs: DRILL-1742
>     https://issues.apache.org/jira/browse/DRILL-1742
> 
> 
> Repository: drill-git
> 
> 
> Description
> -------
> 
> HiveScan.getSplits() already gets the table and partitions metadata using MetaStoreUtils.
> We compute the total number of rows using the numRows property and store the computed
number of rows in rowCount attribute which is later returned by getScanStats().
> 
> 
> Diffs
> -----
> 
>   contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveScan.java
ddbc100 
> 
> Diff: https://reviews.apache.org/r/28417/diff/
> 
> 
> Testing
> -------
> 
> created several partitioned and non-partitioned tables, loaded data in hive.
> 
> used explain plan to check the number of rows when the whole table is queried and also
when specific partitions are queried (to make sure the row count takes hive partition pruning
into account)
> 
> 
> Thanks,
> 
> abdelhakim deneche
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message