spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Josh Rosen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-7413) Time to write shuffle spill files is not captured in ShuffleWriteMetrics
Date Tue, 12 May 2015 01:29:00 GMT

    [ https://issues.apache.org/jira/browse/SPARK-7413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14539067#comment-14539067
] 

Josh Rosen commented on SPARK-7413:
-----------------------------------

Actually, it looks like we sort-of try to do this in this very confusing block of code at
the end of writePartitionedFile:

{code}
    context.taskMetrics.incMemoryBytesSpilled(memoryBytesSpilled)
    context.taskMetrics.incDiskBytesSpilled(diskBytesSpilled)
    context.taskMetrics.shuffleWriteMetrics.filter(_ => bypassMergeSort).foreach { m =>
      if (curWriteMetrics != null) {
        m.incShuffleBytesWritten(curWriteMetrics.shuffleBytesWritten)
        m.incShuffleWriteTime(curWriteMetrics.shuffleWriteTime)
        m.incShuffleRecordsWritten(curWriteMetrics.shuffleRecordsWritten)
      }
    }

    lengths
  }
{code}

In spillToPartitionFiles, it looks like curWriteMetrics only has one value, so we do actually
capture the proper write metrics.  In spillToMergeableFile, curWriteMetrics is re-assigned
a bunch of times but its value doesn't seem to be read anywhere, which makes it seem like
we might not be properly counting metrics for that path.

It's possible that the current code might be correct and that I'm just misinterpreting it,
but I find the current code to be extremely convoluted and hard to understand.  We should
strongly consider writing proper tests for this and refactoring it early in 1.5.

> Time to write shuffle spill files is not captured in ShuffleWriteMetrics
> ------------------------------------------------------------------------
>
>                 Key: SPARK-7413
>                 URL: https://issues.apache.org/jira/browse/SPARK-7413
>             Project: Spark
>          Issue Type: Bug
>          Components: Shuffle
>            Reporter: Josh Rosen
>
> In ExternalSorter's {{spillToMergeableFile()}} method, we pass ShuffleWriteMetrics instances
to the disk writers, but discard the {{shuffleWriteTime}} metrics captured here.  I think
that we should account for this IO time, possibly by introducing new metrics to distinguish
time spent writing spills vs. writing final shuffle output and extending the UI to break down
the overall IO write time in terms of these two components.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message