drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jinfeng Ni" <...@maprtech.com>
Subject Re: Review Request 38796: DRILL-3209: Support reading Hive tables using Drill's native parquet reader
Date Mon, 28 Sep 2015 21:21:27 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/38796/#review100855
-----------------------------------------------------------


I mainly looked through the change in planning side. You may have another person to look at
the execution side change.

The planning side overall looks good to me. Have couple of comments below.


contrib/storage-hive/core/src/main/java/org/apache/drill/exec/planner/sql/logical/ConvertHiveParquetScanToDrillParquetScan.java
(line 89)
<https://reviews.apache.org/r/38796/#comment158101>

    You may consider put the checking of option when adding this rule into Drill's rule set.
See [1].
    
    Doing this will save the overhead to matching this rule.
    
    [1] https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillRuleSets.java#L229



contrib/storage-hive/core/src/main/java/org/apache/drill/exec/planner/sql/logical/ConvertHiveParquetScanToDrillParquetScan.java
(line 102)
<https://reviews.apache.org/r/38796/#comment158102>

    partitions.size() ==0 would be covered by line 114 (the block 107-112 will be skipped)?



contrib/storage-hive/core/src/main/java/org/apache/drill/exec/planner/sql/logical/ConvertHiveParquetScanToDrillParquetScan.java
(line 128)
<https://reviews.apache.org/r/38796/#comment158104>

    the format seems to expect 3 inputs; is "e" unused?



contrib/storage-hive/core/src/main/java/org/apache/drill/exec/planner/sql/logical/ConvertHiveParquetScanToDrillParquetScan.java
(line 176)
<https://reviews.apache.org/r/38796/#comment158108>

    I feel the block of checking "*" seems unnecesary. If I understand correctly, Hive table
is regarded as table with schema. Therefore, we should not see "*" in hiveScanRel.getRowType;
* column should have been expanded into the list of columns in Hive table.



contrib/storage-hive/core/src/main/java/org/apache/drill/exec/planner/sql/logical/ConvertHiveParquetScanToDrillParquetScan.java
(line 195)
<https://reviews.apache.org/r/38796/#comment158110>

    Seems like during the convertion, you only maintain the column names. What about the column
type? Do we need to maintain the column type as well? DrillFixedRelDataTypeImpl assume every
column has "any" type; meaning the column type in hive table is lost?



contrib/storage-hive/core/src/main/java/org/apache/drill/exec/planner/sql/logical/ConvertHiveParquetScanToDrillParquetScan.java
(line 199)
<https://reviews.apache.org/r/38796/#comment158111>

    I feel probably there is a bug in existing code HiveScan.getColumn() should not contain
"*", if Drill treats Hive table as schemaed table.



contrib/storage-hive/core/src/main/java/org/apache/drill/exec/planner/sql/logical/ConvertHiveParquetScanToDrillParquetScan.java
(line 259)
<https://reviews.apache.org/r/38796/#comment158123>

    Again, if the column type is maintained during the conversion, seems it's necessary to
cast for every column; if the hive type happens to be same as parquet type, we do not have
to cast.


- Jinfeng Ni


On Sept. 27, 2015, 7:50 a.m., Venki Korukanti wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/38796/
> -----------------------------------------------------------
> 
> (Updated Sept. 27, 2015, 7:50 a.m.)
> 
> 
> Review request for drill and Jinfeng Ni.
> 
> 
> Repository: drill-git
> 
> 
> Description
> -------
> 
> Please jira DRILL-3209 for details.
> 
> 
> Diffs
> -----
> 
>   contrib/storage-hive/core/src/main/java/org/apache/drill/exec/planner/sql/HivePartitionDescriptor.java
11c6455 
>   contrib/storage-hive/core/src/main/java/org/apache/drill/exec/planner/sql/logical/ConvertHiveParquetScanToDrillParquetScan.java
PRE-CREATION 
>   contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveDrillNativeParquetScan.java
PRE-CREATION 
>   contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveDrillNativeParquetSubScan.java
PRE-CREATION 
>   contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveDrillNativeScanBatchCreator.java
PRE-CREATION 
>   contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveScan.java
9ada569 
>   contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveStoragePlugin.java
23aa37f 
>   contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveSubScan.java
2181c2a 
>   contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/schema/DrillHiveTable.java
b459ee4 
>   contrib/storage-hive/core/src/test/java/org/apache/drill/exec/TestHivePartitionPruning.java
f0b4bdc 
>   contrib/storage-hive/core/src/test/java/org/apache/drill/exec/TestHiveProjectPushDown.java
6423a36 
>   contrib/storage-hive/core/src/test/java/org/apache/drill/exec/hive/TestHiveStorage.java
9211af6 
>   contrib/storage-hive/core/src/test/java/org/apache/drill/exec/hive/TestInfoSchemaOnHiveStorage.java
6118be5 
>   contrib/storage-hive/core/src/test/java/org/apache/drill/exec/store/hive/HiveTestDataGenerator.java
34a7ed6 
>   exec/java-exec/src/main/java/org/apache/drill/exec/ExecConstants.java 0f6a5bb 
>   exec/java-exec/src/main/java/org/apache/drill/exec/server/options/SystemOptionManager.java
118f7ad 
> 
> Diff: https://reviews.apache.org/r/38796/diff/
> 
> 
> Testing
> -------
> 
> Added unittests to test reading all supported types, project pushdown and partition pruning.
Manually tested with Hive tables containing large amount of data (these tests will become
part of the regression suite).
> 
> 
> Thanks,
> 
> Venki Korukanti
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message