spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zhenhua Wang (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SPARK-21083) Consider staleness when collecting column stats
Date Sat, 08 Jul 2017 02:29:00 GMT

     [ https://issues.apache.org/jira/browse/SPARK-21083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Zhenhua Wang updated SPARK-21083:
---------------------------------
    Description: 
1. When we first analyze without `noscan` and then analyze with `noscan`, the table is not
changed, so we should keep row count in statistics.
2. When we first analyze one column in table and then analyze another column, the table is
not changed, so we should keep the previous column stats and combine them with the newly collected
column stats.

  was:
Suppose we already collected column stats for some columns before, then, when we collect column
stats for other columns:
* If the table is changed during two collecting actions, we need to remove these stale column
stats, only keep the latest stats.
* Otherwise, combine these two sets of column stats.

Note that we always update sizeInBytes/rowCount when collecting column stats, that logic doesn't
need change.


> Consider staleness when collecting column stats
> -----------------------------------------------
>
>                 Key: SPARK-21083
>                 URL: https://issues.apache.org/jira/browse/SPARK-21083
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>    Affects Versions: 2.3.0
>            Reporter: Zhenhua Wang
>
> 1. When we first analyze without `noscan` and then analyze with `noscan`, the table is
not changed, so we should keep row count in statistics.
> 2. When we first analyze one column in table and then analyze another column, the table
is not changed, so we should keep the previous column stats and combine them with the newly
collected column stats.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message