spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Arun Mahadevan (JIRA)" <>
Subject [jira] [Commented] (SPARK-24036) Stateful operators in continuous processing
Date Wed, 25 Apr 2018 20:37:00 GMT


Arun Mahadevan commented on SPARK-24036:

Hi [~joseph.torres], I am also interested to contribute to this effort if you are open to

> Supporting single partition aggregates. I have a substantially complete prototype of
this in [] - it doesn't really involve design
as much as removing a very silly hack I put in earlier.

Does it require saving the aggregate state by injecting epoch marker into the stream or
it just works using the iterator approach since its involves only single partition?

To extend this to support multiple partition and shuffles, shouldn't the epoch markers be
injected into the stream and state save happen on receiving the markers from all the parent
tasks ?

 > Just write RPC endpoints on both ends tossing rows around, optimizing for throughput
later if needed. (I'm leaning towards this one.)

So buffering of the rows between the stages and handling back-pressure needs to be considered
here ? Would the existing shuffle infrastructure make it easier to handle this ?


> Stateful operators in continuous processing
> -------------------------------------------
>                 Key: SPARK-24036
>                 URL:
>             Project: Spark
>          Issue Type: Improvement
>          Components: Structured Streaming
>    Affects Versions: 2.4.0
>            Reporter: Jose Torres
>            Priority: Major
> The first iteration of continuous processing in Spark 2.3 does not work with stateful

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message