spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "jin xing (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SPARK-22334) Check table size from HDFS in case the size in metastore is wrong.
Date Mon, 23 Oct 2017 13:03:00 GMT

     [ https://issues.apache.org/jira/browse/SPARK-22334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

jin xing updated SPARK-22334:
-----------------------------
    Summary: Check table size from HDFS in case the size in metastore is wrong.  (was: Check
table size from Hdfs in case the size in metastore is wrong.)

> Check table size from HDFS in case the size in metastore is wrong.
> ------------------------------------------------------------------
>
>                 Key: SPARK-22334
>                 URL: https://issues.apache.org/jira/browse/SPARK-22334
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.2.0
>            Reporter: jin xing
>
> Currently we use table properties('totalSize') to judge if to use broadcast join. Table
properties are from metastore. However they can be wrong. Hive sometimes fails to update table
properties after producing data successfully(e,g, NameNode timeout from https://github.com/apache/hive/blob/branch-1.2/metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java#L180).
If 'totalSize' in table properties is much smaller than its real size on HDFS, Spark can launch
broadcast join by mistake and suffer OOM.
> Could we add a defense config and check the size from HDFS when 'totalSize' is below
{{spark.sql.autoBroadcastJoinThreshold}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message