spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jacek Laskowski <ja...@japila.pl>
Subject Re: [SQL] Why no numOutputRows metric for LocalTableScanExec in webUI?
Date Fri, 17 Nov 2017 08:04:46 GMT
Hi Ryan,

That does make a lot of sense! Thanks for steering me in a right direction.

Quoting SQLMetric [1]:

> Updates on the driver side must be explicitly posted using
SQLMetrics.postDriverMetricUpdates().

Why is LocalTableScanExec not following the "must"
requirement? FileSourceScanExec does (and so does BroadcastExchangeExec,
but that's not a data source so may have different reasons).

[1]
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/metric/SQLMetrics.scala#L31-L32

Pozdrawiam,
Jacek Laskowski
----
https://about.me/JacekLaskowski
Spark Structured Streaming https://bit.ly/spark-structured-streaming
Mastering Apache Spark 2 https://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski

On Fri, Nov 17, 2017 at 2:30 AM, Shixiong(Ryan) Zhu <shixiong@databricks.com
> wrote:

> SQL metrics are collected using SparkListener. If there are no
> tasks, org.apache.spark.sql.execution.ui.SQLListener cannot collect any
> metrics.
>
> On Thu, Nov 16, 2017 at 1:53 AM, Jacek Laskowski <jacek@japila.pl> wrote:
>
>> Hi,
>>
>> I seem to have figured out why the metric is not in the web UI for the
>> query, but wish I knew how to explain it for any metric and operator.
>>
>> It seems that numOutputRows metric won't be displayed in web UI when a
>> query uses no Spark jobs.
>>
>> val names = Seq("Jacek", "Agata").toDF("name")
>>
>> // no numOutputRows metric in web UI
>> names.show
>>
>> // The query gives numOutputRows metric in web UI's Details for Query
>> (SQL tab)
>> scala> names.groupBy(length($"name")).count.show
>>
>> That must be somewhat generic and I think has nothing to do with
>> LocalTableScanExec. Could anyone explain it in more detail? I'd appreciate.
>>
>> Pozdrawiam,
>> Jacek Laskowski
>> ----
>> https://about.me/JacekLaskowski
>> Spark Structured Streaming https://bit.ly/spark-structured-streaming
>> Mastering Apache Spark 2 https://bit.ly/mastering-apache-spark
>> Follow me at https://twitter.com/jaceklaskowski
>>
>> On Wed, Nov 15, 2017 at 10:14 PM, Jacek Laskowski <jacek@japila.pl>
>> wrote:
>>
>>> Hi,
>>>
>>> I've been playing with LocalTableScanExec and noticed that it
>>> defines numOutputRows metric, but I couldn't find it in the diagram in web
>>> UI's Details for Query in SQL tab. Why?
>>>
>>> scala> spark.version
>>> res1: String = 2.3.0-SNAPSHOT
>>>
>>> scala> val hello = udf { s: String => s"Hello $s" }
>>> hello: org.apache.spark.sql.expressions.UserDefinedFunction =
>>> UserDefinedFunction(<function1>,StringType,Some(List(StringType)))
>>>
>>> scala> Seq("Jacek").toDF("name").select(hello($"name")).show
>>> +-----------+
>>> |  UDF(name)|
>>> +-----------+
>>> |Hello Jacek|
>>> +-----------+
>>>
>>> http://localhost:4040/SQL/execution/?id=0 shows no metrics for
>>> LocalTableScan. Is this intended?
>>>
>>> Pozdrawiam,
>>> Jacek Laskowski
>>> ----
>>> https://about.me/JacekLaskowski
>>> Spark Structured Streaming https://bit.ly/spark-structured-streaming
>>> Mastering Apache Spark 2 https://bit.ly/mastering-apache-spark
>>> Follow me at https://twitter.com/jaceklaskowski
>>>
>>
>>
>

Mime
View raw message