spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "jin xing (JIRA)" <j...@apache.org>
Subject [jira] [Created] (SPARK-22334) Check table size from Hdfs in case the size in metastore is wrong.
Date Mon, 23 Oct 2017 13:02:02 GMT
jin xing created SPARK-22334:
--------------------------------

             Summary: Check table size from Hdfs in case the size in metastore is wrong.
                 Key: SPARK-22334
                 URL: https://issues.apache.org/jira/browse/SPARK-22334
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 2.2.0
            Reporter: jin xing


Currently we use table properties('totalSize') to judge if to use broadcast join. Table properties
are from metastore. However they can be wrong. Hive sometimes fails to update table properties
after producing data successfully(e,g, NameNode timeout from https://github.com/apache/hive/blob/branch-1.2/metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java#L180).
If 'totalSize' in table properties is much smaller than its real size on HDFS, Spark can launch
broadcast join by mistake and suffer OOM.
Could we add a defense config and check the size from HDFS when 'totalSize' is below {{spark.sql.autoBroadcastJoinThreshold}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message