spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dongjoon Hyun (Jira)" <>
Subject [jira] [Updated] (SPARK-26302) retainedBatches configuration can eat up memory on driver
Date Mon, 16 Mar 2020 22:54:06 GMT


Dongjoon Hyun updated SPARK-26302:
    Affects Version/s:     (was: 3.0.0)

> retainedBatches configuration can eat up memory on driver
> ---------------------------------------------------------
>                 Key: SPARK-26302
>                 URL:
>             Project: Spark
>          Issue Type: Improvement
>          Components: Documentation, DStreams
>    Affects Versions: 3.1.0
>            Reporter: Behroz Sikander
>            Priority: Minor
>         Attachments: heap_dump_detail.png
> The documentation for configuration "spark.streaming.ui.retainedBatches" says
> "How many batches the Spark Streaming UI and status APIs remember before garbage collecting"
> The default for this configuration is 1000.
> From our experience, the documentation is incomplete and we found it the hard way.
> The size of a single BatchUIData is around 750KB. Increasing this value to something
like 5000 increases the total size to ~4GB.
> If your driver heap is not big enough, the job starts to slow down, has frequent GCs
and has long scheduling days. Once the heap is full, the job cannot be recovered.
> A note of caution should be added to the documentation to let users know the impact of
this seemingly harmless configuration property.

This message was sent by Atlassian Jira

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message