spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jungtaek Lim (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-24634) Add a new metric regarding number of rows later than watermark
Date Sat, 23 Jun 2018 02:32:00 GMT

    [ https://issues.apache.org/jira/browse/SPARK-24634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16520948#comment-16520948
] 

Jungtaek Lim commented on SPARK-24634:
--------------------------------------

Working on this. Will submit a patch soon.

> Add a new metric regarding number of rows later than watermark
> --------------------------------------------------------------
>
>                 Key: SPARK-24634
>                 URL: https://issues.apache.org/jira/browse/SPARK-24634
>             Project: Spark
>          Issue Type: Improvement
>          Components: Structured Streaming
>    Affects Versions: 2.4.0
>            Reporter: Jungtaek Lim
>            Priority: Major
>
> Spark filters out late rows which are later than watermark while applying operations
which leverage window. While Spark exposes information regarding watermark to StreamingQueryListener,
there's no information regarding rows being filtered out due to watermark. The information
should help end users to adjust watermark while operating their query.
> We could expose metric regarding number of rows later than watermark and being filtered
out. It would be ideal to support side-output to consume late rows, but it doesn't look like
easy so addressing this first.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message