spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nikunj Bansal (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-25302) ReducedWindowedDStream not using checkpoints for reduced RDDs
Date Sat, 01 Sep 2018 08:12:00 GMT

    [ https://issues.apache.org/jira/browse/SPARK-25302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16599565#comment-16599565
] 

Nikunj Bansal commented on SPARK-25302:
---------------------------------------

I have a potential fix for this and SPARK-25303 available.

> ReducedWindowedDStream not using checkpoints for reduced RDDs
> -------------------------------------------------------------
>
>                 Key: SPARK-25302
>                 URL: https://issues.apache.org/jira/browse/SPARK-25302
>             Project: Spark
>          Issue Type: Bug
>          Components: DStreams
>    Affects Versions: 2.0.0, 2.0.1, 2.0.2, 2.1.0, 2.1.1, 2.1.2, 2.1.3, 2.2.0, 2.2.1, 2.2.2,
2.3.0, 2.3.1
>            Reporter: Nikunj Bansal
>            Priority: Major
>              Labels: Streaming, streaming
>
> When using reduceByKeyAndWindow() using inverse reduce function, it eventually creates
a ReducedWindowedDStream. This class creates a reducedDStream but only persists it and does
not checkpoint it. The result is that it ends up using cached RDDs and does not cut lineage
to the input DStream resulting in eventually caching the input RDDs for much longer than
they are needed. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message