hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chao Sun" <chao....@cloudera.com>
Subject Re: Review Request 34666: HIVE-9152 - Dynamic Partition Pruning [Spark Branch]
Date Wed, 08 Jul 2015 03:35:44 GMT


> On July 2, 2015, 6:36 a.m., chengxiang li wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/optimizer/SparkRemoveDynamicPruningBySize.java,
line 59
> > <https://reviews.apache.org/r/34666/diff/1/?file=971706#file971706line59>
> >
> >     The statistic data shoud be quite unaccurate after filter and group, as it's
computered based on estimation during compile time. I think threshold verification on unaccurate
data should be unacceptable as that means the threshold may not work at all.
> >     We may check this threshold in SparkPartitionPruningSinkOperator at runtime.
> 
> Chao Sun wrote:
>     Switching to runtime would be very different - here we want to check this threshold,
and avoid generating the pruning task if possible.
>     How inaccurate the stats would be? I'm fine if it's always more conservative.
> 
> chengxiang li wrote:
>     Take FilterOperator for example, the worst case is, it may just half the input rows
as its statistic, you can find the rule for FilterOperator at FilterStatsRule, so it's a bad
news that estimated statistics is not always conservative, this would make the threshold does
not work as expected sometimes. You may create a followup work for this if it changes a lot.

OK, that makes sense. I think we can address this issue in the follow-up JIRA.


- Chao


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34666/#review90197
-----------------------------------------------------------


On July 3, 2015, 10:45 p.m., Chao Sun wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/34666/
> -----------------------------------------------------------
> 
> (Updated July 3, 2015, 10:45 p.m.)
> 
> 
> Review request for hive, chengxiang li and Xuefu Zhang.
> 
> 
> Bugs: HIVE-9152
>     https://issues.apache.org/jira/browse/HIVE-9152
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Tez implemented dynamic partition pruning in HIVE-7826. This is a nice optimization and
we should implement the same in HOS.
> 
> 
> Diffs
> -----
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 43c53fc 
>   itests/src/test/resources/testconfiguration.properties 2a5f7e3 
>   ql/if/queryplan.thrift c8dfa35 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 91e8a02 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveSparkClientFactory.java 21398d8

>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkDynamicPartitionPruner.java PRE-CREATION

>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java e6c845c 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorSparkPartitionPruningSinkOperator.java
PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java 1de7e40 
>   ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java 9d5730d 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/DynamicPartitionPruningOptimization.java
8546d21 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java ea5efe5 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/RemoveDynamicPruningBySize.java 4803959

>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java 5f731d7 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkPartitionPruningSinkDesc.java
PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 447f104

>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java e27ce0d 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/OptimizeSparkProcContext.java f7586a4

>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 19aae70 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkPartitionPruningSinkOperator.java
PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SplitOpTreeForDPP.java PRE-CREATION

>   ql/src/java/org/apache/hadoop/hive/ql/plan/MapWork.java 05a5841 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java aa291b9 
>   ql/src/java/org/apache/hadoop/hive/ql/ppd/SyntheticJoinPredicate.java 363e49e 
>   ql/src/test/queries/clientpositive/spark_dynamic_partition_pruning.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/spark_dynamic_partition_pruning_2.q PRE-CREATION

>   ql/src/test/results/clientpositive/spark/bucket2.q.out 89c3b4c 
>   ql/src/test/results/clientpositive/spark/bucket3.q.out 2fc4855 
>   ql/src/test/results/clientpositive/spark/bucket4.q.out 44e0f9f 
>   ql/src/test/results/clientpositive/spark/column_access_stats.q.out 3e16f61 
>   ql/src/test/results/clientpositive/spark/limit_partition_metadataonly.q.out e95d2ab

>   ql/src/test/results/clientpositive/spark/list_bucket_dml_2.q.java1.7.out e38ccf8 
>   ql/src/test/results/clientpositive/spark/optimize_nullscan.q.out 881f41a 
>   ql/src/test/results/clientpositive/spark/pcr.q.out 4c22f0b 
>   ql/src/test/results/clientpositive/spark/sample3.q.out 2fe6b0d 
>   ql/src/test/results/clientpositive/spark/sample9.q.out c9823f7 
>   ql/src/test/results/clientpositive/spark/smb_mapjoin_11.q.out c3f996f 
>   ql/src/test/results/clientpositive/spark/spark_dynamic_partition_pruning.q.out PRE-CREATION

>   ql/src/test/results/clientpositive/spark/spark_dynamic_partition_pruning_2.q.out PRE-CREATION

>   ql/src/test/results/clientpositive/spark/temp_table.q.out 16d663d 
>   ql/src/test/results/clientpositive/spark/udf_example_add.q.out 7916679 
>   ql/src/test/results/clientpositive/spark/udf_in_file.q.out c769d1f 
>   ql/src/test/results/clientpositive/spark/union_view.q.out 593ce40 
>   ql/src/test/results/clientpositive/spark/vector_elt.q.out 180ea15 
>   ql/src/test/results/clientpositive/spark/vector_string_concat.q.out 9ec8538 
>   ql/src/test/results/clientpositive/spark/vectorization_decimal_date.q.out bafd62f 
>   ql/src/test/results/clientpositive/spark/vectorization_div0.q.out 30d116f 
>   ql/src/test/results/clientpositive/spark/vectorized_case.q.out daf6ad3 
>   ql/src/test/results/clientpositive/spark/vectorized_dynamic_partition_pruning.q.out
PRE-CREATION 
>   ql/src/test/results/clientpositive/spark/vectorized_math_funcs.q.out 470d9a9 
>   ql/src/test/results/clientpositive/spark/vectorized_string_funcs.q.out ef98ae9 
> 
> Diff: https://reviews.apache.org/r/34666/diff/
> 
> 
> Testing
> -------
> 
> spark_dynamic_partition_pruning.q, spark_dynamic_partition_pruning_2.q - both are clone
from tez's test.
> 
> 
> Thanks,
> 
> Chao Sun
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message