drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Venki Korukanti" <venki.koruka...@gmail.com>
Subject Re: Review Request 38796: DRILL-3209: Support reading Hive tables using Drill's native parquet reader
Date Tue, 29 Sep 2015 16:19:20 GMT


> On Sept. 28, 2015, 2:21 p.m., Jinfeng Ni wrote:
> > contrib/storage-hive/core/src/main/java/org/apache/drill/exec/planner/sql/logical/ConvertHiveParquetScanToDrillParquetScan.java,
line 128
> > <https://reviews.apache.org/r/38796/diff/2/?file=1085484#file1085484line128>
> >
> >     the format seems to expect 3 inputs; is "e" unused?

I am passing the exception to logger so that it is printed in logs. More details here: http://stackoverflow.com/questions/6371638/slf4j-how-to-log-formatted-message-object-array-exception


> On Sept. 28, 2015, 2:21 p.m., Jinfeng Ni wrote:
> > contrib/storage-hive/core/src/main/java/org/apache/drill/exec/planner/sql/logical/ConvertHiveParquetScanToDrillParquetScan.java,
line 176
> > <https://reviews.apache.org/r/38796/diff/2/?file=1085484#file1085484line176>
> >
> >     I feel the block of checking "*" seems unnecesary. If I understand correctly,
Hive table is regarded as table with schema. Therefore, we should not see "*" in hiveScanRel.getRowType;
* column should have been expanded into the list of columns in Hive table.

You are right. Columns are always expanded in RelDataType, because we call DrillHiveTable.getRowType()
which by default returns the full schema. Removed the block.


> On Sept. 28, 2015, 2:21 p.m., Jinfeng Ni wrote:
> > contrib/storage-hive/core/src/main/java/org/apache/drill/exec/planner/sql/logical/ConvertHiveParquetScanToDrillParquetScan.java,
line 195
> > <https://reviews.apache.org/r/38796/diff/2/?file=1085484#file1085484line195>
> >
> >     Seems like during the convertion, you only maintain the column names. What about
the column type? Do we need to maintain the column type as well? DrillFixedRelDataTypeImpl
assume every column has "any" type; meaning the column type in hive table is lost?

Changed to use typeFactory.createStructType(typeList, nameList). Also removed the casts added
to regular columns.


> On Sept. 28, 2015, 2:21 p.m., Jinfeng Ni wrote:
> > contrib/storage-hive/core/src/main/java/org/apache/drill/exec/planner/sql/logical/ConvertHiveParquetScanToDrillParquetScan.java,
line 199
> > <https://reviews.apache.org/r/38796/diff/2/?file=1085484#file1085484line199>
> >
> >     I feel probably there is a bug in existing code HiveScan.getColumn() should
not contain "*", if Drill treats Hive table as schemaed table.

As it is an existing bug logged DRILL-3852 fix it separately from this patch..


- Venki


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/38796/#review100855
-----------------------------------------------------------


On Sept. 27, 2015, 7:50 a.m., Venki Korukanti wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/38796/
> -----------------------------------------------------------
> 
> (Updated Sept. 27, 2015, 7:50 a.m.)
> 
> 
> Review request for drill and Jinfeng Ni.
> 
> 
> Repository: drill-git
> 
> 
> Description
> -------
> 
> Please jira DRILL-3209 for details.
> 
> 
> Diffs
> -----
> 
>   contrib/storage-hive/core/src/main/java/org/apache/drill/exec/planner/sql/HivePartitionDescriptor.java
11c6455 
>   contrib/storage-hive/core/src/main/java/org/apache/drill/exec/planner/sql/logical/ConvertHiveParquetScanToDrillParquetScan.java
PRE-CREATION 
>   contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveDrillNativeParquetScan.java
PRE-CREATION 
>   contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveDrillNativeParquetSubScan.java
PRE-CREATION 
>   contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveDrillNativeScanBatchCreator.java
PRE-CREATION 
>   contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveScan.java
9ada569 
>   contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveStoragePlugin.java
23aa37f 
>   contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveSubScan.java
2181c2a 
>   contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/schema/DrillHiveTable.java
b459ee4 
>   contrib/storage-hive/core/src/test/java/org/apache/drill/exec/TestHivePartitionPruning.java
f0b4bdc 
>   contrib/storage-hive/core/src/test/java/org/apache/drill/exec/TestHiveProjectPushDown.java
6423a36 
>   contrib/storage-hive/core/src/test/java/org/apache/drill/exec/hive/TestHiveStorage.java
9211af6 
>   contrib/storage-hive/core/src/test/java/org/apache/drill/exec/hive/TestInfoSchemaOnHiveStorage.java
6118be5 
>   contrib/storage-hive/core/src/test/java/org/apache/drill/exec/store/hive/HiveTestDataGenerator.java
34a7ed6 
>   exec/java-exec/src/main/java/org/apache/drill/exec/ExecConstants.java 0f6a5bb 
>   exec/java-exec/src/main/java/org/apache/drill/exec/server/options/SystemOptionManager.java
118f7ad 
> 
> Diff: https://reviews.apache.org/r/38796/diff/
> 
> 
> Testing
> -------
> 
> Added unittests to test reading all supported types, project pushdown and partition pruning.
Manually tested with Hive tables containing large amount of data (these tests will become
part of the regression suite).
> 
> 
> Thanks,
> 
> Venki Korukanti
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message