hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "he yongqiang (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-4845) Shuffle counter issues
Date Fri, 12 Dec 2008 06:40:44 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-4845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12655902#action_12655902

he yongqiang commented on HADOOP-4845:

we could just increment the Counter (REDUCE_INPUT_BYTES) in fetchOutputs rather than have
a member 'reduceInputBytes'.
thank you for the point. this would be easy to fix. 
maybe we should change the name REDUCE_INPUT_BYTES to REDUCE_SHUFFLE_BYTES ?

We need to have a counter accounting the number of bytes FETCHED for each reduce the end of
If the compression was turned on, that should be the number of bytes of the compressed data.

by the number of bytes FETCHED, both sucess and failed? or just the successed copies?
currently, the added Counter REDUCE_INPUT_BYTES records the succssfully fetched from mappers.
If "shuffleInMemory", the counter is for the decompressed data, otherwise, the counter is
for the compressed data fetched.

We should also estimate the compression ratio of the fetched compressed data and report it
We should also report the number of segments and number of bytes written to the local disks
at the end of shuffling

Does these are need ? 

At the end of reduce, we should know the number of records and bytes to the reduce.
now we have REDUCE_INPUT_RECORDS for number of records to the reduce. 
Maybe we should change the current name REDUCE_INPUT_BYTES to REDUCE_SHUFFLE_BYTES, and use
the name REDUCE_INPUT_BYTES for bytes to the reduce?

> Shuffle counter issues
> ----------------------
>                 Key: HADOOP-4845
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4845
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.20.0
>            Reporter: Chris Douglas
>             Fix For: 0.20.0
> HADOOP-4749 added a new counter tracking the bytes shuffled into the reduce. It adds
an accumulator to ReduceCopier instead of simply incrementing the new counter and did not
define a human-readable value in src/mapred/org/apache/hadoop/mapred/Task_Counter.properties.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message