spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dongjoon Hyun (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-28046) OOM caused by building hash table when the compressed ratio of small table is normal
Date Mon, 17 Jun 2019 01:55:00 GMT

    [ https://issues.apache.org/jira/browse/SPARK-28046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16865236#comment-16865236
] 

Dongjoon Hyun commented on SPARK-28046:
---------------------------------------

In general, this `LongToUnsafeRowMap` issue is the same one with SPARK-24912 

> OOM caused by building hash table when the compressed ratio of small table is normal
> ------------------------------------------------------------------------------------
>
>                 Key: SPARK-28046
>                 URL: https://issues.apache.org/jira/browse/SPARK-28046
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 2.4.1
>            Reporter: Ke Jia
>            Priority: Major
>         Attachments: image-2019-06-14-10-34-53-379.png
>
>
> Currently, spark will convert the sort merge join to broadcast hash join when the small
table compressed  size <= the broadcast threshold.  Same with Spark, AE also convert
the smj to bhj based on the compressed size in runtime.  In our test, when enable ae with
32M broadcast threshold, one smj with 16M compressed size is converted to bhj. However, when
building the hash table, the 16M small table is decompressed with 2GB size and has 134485048
row count, which has a mount of continuous and repeated values. Therefore, the following OOM
exception occurs when building hash table:
> !image-2019-06-14-10-34-53-379.png!
> And based on this founding , it may be not reasonable to decide whether smj be converted
to bhj only by the compressed size. In AE, we add the condition with the estimation  decompressed
size based on the row counts. And in spark, we may also need the decompressed size or row
counts condition judgment not only the compressed size when converting the smj to bhj.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message