spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kay Ousterhout (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SPARK-3570) Shuffle write time does not include time to open shuffle files
Date Wed, 17 Sep 2014 20:01:35 GMT

     [ https://issues.apache.org/jira/browse/SPARK-3570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Kay Ousterhout updated SPARK-3570:
----------------------------------
    Attachment: 3a_1410854905_0_job_log_waterfall.pdf
                3a_1410943402_0_job_log_waterfall.pdf

In case anyone is extra curious about this...here are two plots of the same job, with the
fixed logging (that includes file open time) in the first job.  You can see that fixing this
metric can be the difference between mysterious stragglers tasks and stragglers that are clearly
due to disk activity.

> Shuffle write time does not include time to open shuffle files
> --------------------------------------------------------------
>
>                 Key: SPARK-3570
>                 URL: https://issues.apache.org/jira/browse/SPARK-3570
>             Project: Spark
>          Issue Type: Bug
>    Affects Versions: 0.9.2, 1.0.2, 1.1.0
>            Reporter: Kay Ousterhout
>            Assignee: Kay Ousterhout
>         Attachments: 3a_1410854905_0_job_log_waterfall.pdf, 3a_1410943402_0_job_log_waterfall.pdf
>
>
> Currently, the reported shuffle write time does not include time to open the shuffle
files.  This time can be very significant when the disk is highly utilized and many shuffle
files exist on the machine (I'm not sure how severe this is in 1.0 onward -- since shuffle
files are automatically deleted, this may be less of an issue because there are fewer old
files sitting around).  In experiments I did, in extreme cases, adding the time to open files
can increase the shuffle write time from 5ms (of a 2 second task) to 1 second.  We should
fix this for better performance debugging.
> Thanks [~shivaram] for helping to diagnose this problem.  cc [~pwendell]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message