hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chao Sun" <chao....@cloudera.com>
Subject Re: Review Request 34666: HIVE-9152 - Dynamic Partition Pruning [Spark Branch]
Date Fri, 03 Jul 2015 22:32:40 GMT


> On May 27, 2015, 6:52 p.m., Xuefu Zhang wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java, line 177
> > <https://reviews.apache.org/r/34666/diff/1/?file=971700#file971700line177>
> >
> >     Any chance that an op might be visited multiple times?

It shouldn't - it'a tree traversing and every operator should only be added once.


> On May 27, 2015, 6:52 p.m., Xuefu Zhang wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java, line 519
> > <https://reviews.apache.org/r/34666/diff/1/?file=971702#file971702line519>
> >
> >     numThread could be <= 0?

It could equal to 0, since getInputPaths() could return 0. This would result an IAE from newFixedThreadPool.


> On May 27, 2015, 6:52 p.m., Xuefu Zhang wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java, line 164
> > <https://reviews.apache.org/r/34666/diff/1/?file=971704#file971704line164>
> >
> >     what's this change about?

This is to delay stats annotation until we've done the DPP optimization. The generated branches
also need to be annotated with stats.


> On May 27, 2015, 6:52 p.m., Xuefu Zhang wrote:
> > ql/src/test/results/clientpositive/spark/smb_mapjoin_11.q.out, line 1914
> > <https://reviews.apache.org/r/34666/diff/1/?file=971731#file971731line1914>
> >
> >     why the stats are gone?

This is because we moved the stats annotation to SparkCompiler, and therefore in SimpleFetchOptimizer,
the fetch task generated won't have the stats.
I don't see this is a big issue and Tez does the same thing as well.


- Chao


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34666/#review85230
-----------------------------------------------------------


On May 26, 2015, 4:28 p.m., Chao Sun wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/34666/
> -----------------------------------------------------------
> 
> (Updated May 26, 2015, 4:28 p.m.)
> 
> 
> Review request for hive, chengxiang li and Xuefu Zhang.
> 
> 
> Bugs: HIVE-9152
>     https://issues.apache.org/jira/browse/HIVE-9152
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Tez implemented dynamic partition pruning in HIVE-7826. This is a nice optimization and
we should implement the same in HOS.
> 
> 
> Diffs
> -----
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 43c53fc 
>   itests/src/test/resources/testconfiguration.properties 2a5f7e3 
>   metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.h 0f86117 
>   metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.cpp a0b34cb 
>   metastore/src/gen/thrift/gen-cpp/hive_metastore_types.h 55e0385 
>   metastore/src/gen/thrift/gen-cpp/hive_metastore_types.cpp 749c97a 
>   metastore/src/gen/thrift/gen-py/hive_metastore/ThriftHiveMetastore.py 4cc54e8 
>   ql/if/queryplan.thrift c8dfa35 
>   ql/src/gen/thrift/gen-cpp/queryplan_types.h ac73bc5 
>   ql/src/gen/thrift/gen-cpp/queryplan_types.cpp 19d4806 
>   ql/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/ql/plan/api/OperatorType.java
e18f935 
>   ql/src/gen/thrift/gen-php/Types.php 7121ed4 
>   ql/src/gen/thrift/gen-py/queryplan/ttypes.py 53c0106 
>   ql/src/gen/thrift/gen-rb/queryplan_types.rb c2c4220 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java 9867739 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 91e8a02 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveSparkClientFactory.java 21398d8

>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkDynamicPartitionPruner.java PRE-CREATION

>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java e6c845c 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorSparkPartitionPruningSinkOperator.java
PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java 1de7e40 
>   ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java 9d5730d 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java ea5efe5 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/SparkDynamicPartitionPruningOptimization.java
PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/SparkRemoveDynamicPruningBySize.java
PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SparkMapJoinResolver.java
8e56263 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java 5f731d7 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkPartitionPruningSinkDesc.java
PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 447f104

>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java e27ce0d 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/OptimizeSparkProcContext.java f7586a4

>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 19aae70 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningOptimizer.java
PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningSinkOperator.java
PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/MapWork.java 05a5841 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java aa291b9 
>   ql/src/java/org/apache/hadoop/hive/ql/ppd/SyntheticJoinPredicate.java 363e49e 
>   ql/src/test/queries/clientpositive/spark_dynamic_partition_pruning.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/spark_dynamic_partition_pruning_2.q PRE-CREATION

>   ql/src/test/results/clientpositive/spark/bucket2.q.out 89c3b4c 
>   ql/src/test/results/clientpositive/spark/bucket3.q.out 2fc4855 
>   ql/src/test/results/clientpositive/spark/bucket4.q.out 44e0f9f 
>   ql/src/test/results/clientpositive/spark/column_access_stats.q.out 3e16f61 
>   ql/src/test/results/clientpositive/spark/limit_partition_metadataonly.q.out e95d2ab

>   ql/src/test/results/clientpositive/spark/list_bucket_dml_2.q.java1.7.out e38ccf8 
>   ql/src/test/results/clientpositive/spark/optimize_nullscan.q.out 881f41a 
>   ql/src/test/results/clientpositive/spark/pcr.q.out 4c22f0b 
>   ql/src/test/results/clientpositive/spark/sample3.q.out 2fe6b0d 
>   ql/src/test/results/clientpositive/spark/sample9.q.out c9823f7 
>   ql/src/test/results/clientpositive/spark/smb_mapjoin_11.q.out c3f996f 
>   ql/src/test/results/clientpositive/spark/spark_dynamic_partition_pruning.q.out PRE-CREATION

>   ql/src/test/results/clientpositive/spark/spark_dynamic_partition_pruning_2.q.out PRE-CREATION

>   ql/src/test/results/clientpositive/spark/temp_table.q.out 16d663d 
>   ql/src/test/results/clientpositive/spark/udf_example_add.q.out 7916679 
>   ql/src/test/results/clientpositive/spark/udf_in_file.q.out c769d1f 
>   ql/src/test/results/clientpositive/spark/union_view.q.out 593ce40 
>   ql/src/test/results/clientpositive/spark/vector_elt.q.out 180ea15 
>   ql/src/test/results/clientpositive/spark/vector_string_concat.q.out 9ec8538 
>   ql/src/test/results/clientpositive/spark/vectorization_decimal_date.q.out bafd62f 
>   ql/src/test/results/clientpositive/spark/vectorization_div0.q.out 30d116f 
>   ql/src/test/results/clientpositive/spark/vectorized_case.q.out daf6ad3 
>   ql/src/test/results/clientpositive/spark/vectorized_math_funcs.q.out 470d9a9 
>   ql/src/test/results/clientpositive/spark/vectorized_string_funcs.q.out ef98ae9 
>   serde/src/gen/thrift/gen-cpp/complex_types.h 3f4c760 
>   serde/src/gen/thrift/gen-cpp/complex_types.cpp 411e1b0 
>   serde/src/gen/thrift/gen-cpp/megastruct_types.cpp 2d46b7f 
>   serde/src/gen/thrift/gen-cpp/testthrift_types.h 6c84b9f 
>   serde/src/gen/thrift/gen-cpp/testthrift_types.cpp 7949f23 
>   serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde/test/ThriftTestObj.java
dda3c5f 
>   serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde2/thrift/test/Complex.java
ff0c1f2 
>   serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde2/thrift/test/MegaStruct.java
fba49e4 
>   serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde2/thrift/test/PropValueUnion.java
a50a508 
>   serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde2/thrift/test/SetIntString.java
334d225 
>   service/src/gen/thrift/gen-cpp/TCLIService.h 030475b 
>   service/src/gen/thrift/gen-cpp/TCLIService.cpp 209ce63 
>   service/src/gen/thrift/gen-cpp/TCLIService_types.h 7bceabd 
>   service/src/gen/thrift/gen-cpp/TCLIService_types.cpp 86eeea3 
>   service/src/gen/thrift/gen-cpp/ThriftHive.h b84362b 
>   service/src/gen/thrift/gen-cpp/ThriftHive.cpp 865db69 
>   service/src/gen/thrift/gen-cpp/hive_service_types.h bc0e652 
>   service/src/gen/thrift/gen-cpp/hive_service_types.cpp 255fb00 
>   service/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/service/ThriftHive.java
1c44789 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TBinaryColumn.java
6b1b054 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TBoolColumn.java
efd571c 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TByteColumn.java
169bfde 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TDoubleColumn.java
4fc5454 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TGetTablesReq.java
c973fcc 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TI16Column.java
c836630 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TI32Column.java
6c6c5f3 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TI64Column.java
cc383ed 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TRow.java a44cfb0

>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TRowSet.java
d16c8a4 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TStatus.java
24a746e 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TStringColumn.java
3dae460 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TTableSchema.java
ff5e54d 
>   service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TTypeDesc.java
251f86a 
>   service/src/gen/thrift/gen-py/hive_service/ThriftHive.py 33912f9 
> 
> Diff: https://reviews.apache.org/r/34666/diff/
> 
> 
> Testing
> -------
> 
> spark_dynamic_partition_pruning.q, spark_dynamic_partition_pruning_2.q - both are clone
from tez's test.
> 
> 
> Thanks,
> 
> Chao Sun
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message