spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kay Ousterhout (JIRA)" <>
Subject [jira] [Updated] (SPARK-3570) Shuffle write time does not include time to open shuffle files
Date Wed, 17 Sep 2014 20:01:35 GMT


Kay Ousterhout updated SPARK-3570:
    Attachment: 3a_1410854905_0_job_log_waterfall.pdf

In case anyone is extra curious about are two plots of the same job, with the
fixed logging (that includes file open time) in the first job.  You can see that fixing this
metric can be the difference between mysterious stragglers tasks and stragglers that are clearly
due to disk activity.

> Shuffle write time does not include time to open shuffle files
> --------------------------------------------------------------
>                 Key: SPARK-3570
>                 URL:
>             Project: Spark
>          Issue Type: Bug
>    Affects Versions: 0.9.2, 1.0.2, 1.1.0
>            Reporter: Kay Ousterhout
>            Assignee: Kay Ousterhout
>         Attachments: 3a_1410854905_0_job_log_waterfall.pdf, 3a_1410943402_0_job_log_waterfall.pdf
> Currently, the reported shuffle write time does not include time to open the shuffle
files.  This time can be very significant when the disk is highly utilized and many shuffle
files exist on the machine (I'm not sure how severe this is in 1.0 onward -- since shuffle
files are automatically deleted, this may be less of an issue because there are fewer old
files sitting around).  In experiments I did, in extreme cases, adding the time to open files
can increase the shuffle write time from 5ms (of a 2 second task) to 1 second.  We should
fix this for better performance debugging.
> Thanks [~shivaram] for helping to diagnose this problem.  cc [~pwendell]

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message