spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alessandro Solimando <alessandro.solima...@gmail.com>
Subject Re: Structured Streaming & Query Planning
Date Thu, 14 Mar 2019 17:59:50 GMT
Hello Paolo,
generally speaking, query planning is mostly based on statistics and
distributions of data values for the involved columns, which might
significantly change over time in a streaming context, so for me it makes a
lot of sense that it is run at every schedule, even though I understand
your concern.

For the second question I don't know how to (or if you even can) cache the
computed query plan.

If possible, would you mind sharing your findings afterwards? (query
planning on streaming it's a very interesting and not yet enough explored
topic IMO)

Best regards,
Alessandro

On Thu, 14 Mar 2019 at 16:51, Paolo Platter <paolo.platter@agilelab.it>
wrote:

> Hi All,
>
>
>
> I would like to understand why in a streaming query ( that should not be
> able to change its behaviour along iterations ) there is a
> queryPlanning-Duration effort ( in my case is 33% of trigger interval ) at
> every schedule. I don’t uderstand  why this is needed and if it is possible
> to disable or cache it.
>
>
>
> Thanks
>
>
>
>
>
> [image: cid:image001.jpg@01D41D15.E01B6F00]
>
> *Paolo Platter*
>
> *CTO*
>
> E-mail:        paolo.platter@agilelab.it
>
> Web Site:   www.agilelab.it
>
>
>
>
>

Mime
View raw message