spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shixiong(Ryan) Zhu" <shixi...@databricks.com>
Subject Re: [Structured Streaming]Data processing and output trigger should be decoupled
Date Wed, 30 Aug 2017 17:59:06 GMT
I don't think that's a good idea. If the engine keeps on processing data
but doesn't output anything, where to keep the intermediate data?

On Wed, Aug 30, 2017 at 9:26 AM, KevinZwx <kevinzwx1992@gmail.com> wrote:

> Hi,
>
> I'm working with structured streaming, and I'm wondering whether there
> should be some improvements about trigger.
>
> Currently, when I specify a trigger, i.e. tigger(Trigger.ProcessingTime(
> "10
> minutes")), the engine will begin processing data at the time the trigger
> begins, like 10:00:00, 10:10:00, 10:20:00,..., etc, if the engine takes 10s
> to process this batch of data, then we will get the output result at
> 10:00:10...,  then the engine just waits without processing any data. When
> the next trigger begins, the engine begins to process the data during the
> interval, and if this time the engine takes 15s to process the batch, we
> will get result at 10:10:15. This is the problem.
>
> In my understanding, the trigger and data processing should be decoupled,
> the engine should keep on processing data as fast as possible, but only
> generate output results at each trigger, therefore we can get the result at
> 10:00:00, 10:10:00, 10:20:00, ... So I'm wondering if there is any solution
> or plan to work on this?
>
>
>
> --
> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
>

Mime
View raw message