spark-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] [spark] shahidki31 commented on a change in pull request #32704: [SPARK-35567][SQL] Fix: Explain cost is not showing statistics for all the nodes
Date Mon, 31 May 2021 07:16:54 GMT

shahidki31 commented on a change in pull request #32704:
URL: https://github.com/apache/spark/pull/32704#discussion_r642265510



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala
##########
@@ -256,13 +255,9 @@ class QueryExecution(
 
     // trigger to compute stats for logical plans
     try {
-      optimizedPlan.foreach(_.expressions.foreach(_.foreach {

Review comment:
       @cloud-fan I have made changes like above and I think this doesn't fix the issue. Issue
here is not all the nodes have stats in statsCache. 
   This is from query3, which doesn't have subqueries, The aggregate node doesn't show stats
with the above changes, but fix with the PR change
   
   ```
   GlobalLimit 100, Statistics(sizeInBytes=2.9 KiB, rowCount=68)
   +- LocalLimit 100, Statistics(sizeInBytes=3.5 KiB, rowCount=68)
      +- Sort [d_year#9 ASC NULLS FIRST, sum_agg#89 DESC NULLS LAST, brand_id#87 ASC NULLS
FIRST], true, Statistics(sizeInBytes=3.5 KiB, rowCount=68)
         +- Aggregate [d_year#9, i_brand#62, i_brand_id#61], [d_year#9, i_brand_id#61 AS brand_id#87,
i_brand#62 AS brand#88, MakeDecimal(sum(UnscaledValue(ss_ext_sales_price#45)),17,2) AS sum_agg#89]
            +- Project [d_year#9, ss_ext_sales_price#45, i_brand_id#61, i_brand#62], Statistics(sizeInBytes=14.0
MiB, rowCount=2.77E+5)
               +- Join Inner, (d_date_sk#3 = ss_sold_date_sk#53), Statistics(sizeInBytes=16.1
MiB, rowCount=2.77E+5)
                  :- Project [ss_ext_sales_price#45, ss_sold_date_sk#53, i_brand_id#61, i_brand#62],
Statistics(sizeInBytes=14.0 MiB, rowCount=2.77E+5)
   ......
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Mime
View raw message