spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Zhang <zjf...@gmail.com>
Subject Re: Is OutputCommitCoordinator necessary for all the stages ?
Date Tue, 11 Aug 2015 23:35:01 GMT
Hi Josh,

I mean on the driver side. OutputCommitCorrdinator.startStage is called in
DAGScheduler#submitMissingTasks for all the stages (cost some memory).
Although it is fine that as long as executor side don't call RPC, there's
no much performance penalty.

On Wed, Aug 12, 2015 at 12:17 AM, Josh Rosen <rosenville@gmail.com> wrote:

> Can you clarify what you mean by "used for all stages"?
> OutputCommitCoordinator RPCs should only be initiated through
> SparkHadoopMapRedUtil.commitTask(), so while the OutputCommitCoordinator
> doesn't make a distinction between ShuffleMapStages and ResultStages there
> still should not be a performance penalty for this because the extra rounds
> of RPCs should only be performed when necessary.
>
>
> On 8/11/15 2:25 AM, Jeff Zhang wrote:
>
>> As my understanding, OutputCommitCoordinator should only be necessary for
>> ResultStage (especially for ResultStage with hdfs write), but currently it
>> is used for all the stages. Is there any reason for that ?
>>
>> --
>> Best Regards
>>
>> Jeff Zhang
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> For additional commands, e-mail: dev-help@spark.apache.org
>
>


-- 
Best Regards

Jeff Zhang

Mime
View raw message