tez-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bikas Saha (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TEZ-2962) Use per partition stats in shuffle vertex manager auto parallelism
Date Tue, 23 Feb 2016 18:55:18 GMT

    [ https://issues.apache.org/jira/browse/TEZ-2962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15159414#comment-15159414
] 

Bikas Saha commented on TEZ-2962:
---------------------------------

The downside of partition stats is that the values are approximate in buckets of 1mb/10mb/100mb
etc. So 100MB stat could imply 900mb actual data size. So respecting max data size per task
can become tricky.

> Use per partition stats in shuffle vertex manager auto parallelism
> ------------------------------------------------------------------
>
>                 Key: TEZ-2962
>                 URL: https://issues.apache.org/jira/browse/TEZ-2962
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Bikas Saha
>            Priority: Critical
>
> The original code used output size sent by completed tasks. Recently per partition stats
have been added that provide granular information. Using partition stats may be more accurate
and also remove the duplicate counting of data size in partition stats and per task overall.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message