Mailing-List: contact mapreduce-issues-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: mapreduce-issues@hadoop.apache.org
Date: Thu, 2 Oct 2014 00:15:34 +0000 (UTC)
From: "Joep Rottinghuis (JIRA)" <jira@apache.org>
To: mapreduce-issues@hadoop.apache.org
Message-ID: <JIRA.12711783.1398974585000.171387.1412208934376@Atlassian.JIRA>
In-Reply-To: <JIRA.12711783.1398974585000@Atlassian.JIRA>
References: <JIRA.12711783.1398974585000@Atlassian.JIRA>
 <JIRA.12711783.1398974585307@arcas>
Subject: [jira] [Commented] (MAPREDUCE-5873) Measure bw of a single copy
 call and display the correct aggregated bw
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/MAPREDUCE-5873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155848#comment-14155848 ] 

Joep Rottinghuis commented on MAPREDUCE-5873:
---------------------------------------------

[~jlowe] are you guys seeing this behavior as well ?
We're finding that the current way that the shuffle bandwidth is shows is not only useless, but also confusing for our users.
If a mapper has to be re-run it appears as if the bandwidth is a tiny amount.
This new way of showing bandwidth is a much better indication of what is going on, and if it is very small, it is actually in indication that there is something not right in the network.


> Measure bw of a single copy call and display the correct aggregated bw
> ----------------------------------------------------------------------
>
>                 Key: MAPREDUCE-5873
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5873
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 2.3.0
>            Reporter: Siqi Li
>            Assignee: Siqi Li
>         Attachments: MAPREDUCE-5873.v1.patch, MAPREDUCE-5873.v2.patch, MAPREDUCE-5873.v3.patch
>
>
> Currently ShuffleScheduler in ReduceTask JVM status displays bandwidth. Its definition however is confusing because it captures the time where there is no copying because there is a pause between when new wave of map outputs is available.
> current bw is definded as (bytes copied so far) / (total time in the copy phase so far)
> It would be more useful 
> 1) to measure bandwidth of a single copy call.
> 2) display aggregated bw as long as there is at least one fetcher is in the copy call.


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)