spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thomas Graves (JIRA)" <>
Subject [jira] [Commented] (SPARK-24611) Clean up OutputCommitCoordinator
Date Thu, 21 Jun 2018 14:23:00 GMT


Thomas Graves commented on SPARK-24611:

[~joshrosen]  just noticed you were the last one to modify the dagscheduler for output commit


where it split ShuffleMapStage from ResultStage handling.

Do you know of any case the ShuffleMapStage actually call into canCommit?

> Clean up OutputCommitCoordinator
> --------------------------------
>                 Key: SPARK-24611
>                 URL:
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 2.4.0
>            Reporter: Marcelo Vanzin
>            Priority: Major
> This is a follow up to SPARK-24589, to address some issues brought up during review of
the change:
> - the DAGScheduler registers all stages with the coordinator, when at first view only
result stages need to. That would save memory in the driver.
> - the coordinator can track task IDs instead of the internal "TaskIdentifier" type it
uses; that would also save some memory, and also be more accurate.
> - {{TaskCommitDenied}} currently has a "job ID" when it's really a stage ID, and it contains
the task attempt number, when it should probably have the task ID instead (like above).
> The latter is an API breakage (in a class tagged as developer API, but still), and also
affects data written to event logs.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message