spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-26327) Metrics in FileSourceScanExec not update correctly
Date Mon, 10 Dec 2018 15:24:00 GMT

    [ https://issues.apache.org/jira/browse/SPARK-26327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16714908#comment-16714908
] 

ASF GitHub Bot commented on SPARK-26327:
----------------------------------------

xuanyuanking opened a new pull request #23277: [SPARK-26327][SQL] Metrics in FileSourceScanExec
not update correctly
URL: https://github.com/apache/spark/pull/23277
 
 
   ## What changes were proposed in this pull request?
   
   As the description in [SPARK-26327](https://issues.apache.org/jira/browse/SPARK-26327),
`postDriverMetricUpdates` was called on wrong place cause this bug, fix this by split the
initializing of `selectedPartitions` and metrics updating logic. Add the updating logic in
`inputRDD` initializing which can take effect in both code generation node and normal node.
   ## How was this patch tested?
   
   New test case in `SQLMetricsSuite`.
   Manual test:
   
   |         | Before | After |
   |---------|:--------:|:-------:|
   | CodeGen |![image](https://user-images.githubusercontent.com/4833765/49741753-13c7e800-fcd2-11e8-97a8-8057b657aa3c.png)|![image](https://user-images.githubusercontent.com/4833765/49741774-1f1b1380-fcd2-11e8-98d9-78b950f4e43a.png)|
   | Normal  |![image](https://user-images.githubusercontent.com/4833765/49741836-378b2e00-fcd2-11e8-80c3-ab462a6a3184.png)|![image](https://user-images.githubusercontent.com/4833765/49741860-4a056780-fcd2-11e8-9ef1-863de217f183.png)|
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> Metrics in FileSourceScanExec not update correctly
> --------------------------------------------------
>
>                 Key: SPARK-26327
>                 URL: https://issues.apache.org/jira/browse/SPARK-26327
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>    Affects Versions: 2.4.0
>            Reporter: Yuanjian Li
>            Priority: Major
>
> As currently approach in `FileSourceScanExec`, the metrics of "numFiles" and "metadataTime"(fileListingTime)
were updated while lazy val `selectedPartitions` initialized. But `selectedPartitions` will
be initialized by `metadata` at first, which is called by `queryExecution.toString` in `SQLExecution.withNewExecutionId`.
So while the `SQLMetrics.postDriverMetricUpdates` called, there's no corresponding liveExecutions
in SQLAppStatusListener, the metrics update is not work.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message