hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wilfred Spiegelenburg (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-7072) mapred job -history prints duplicate counter in human output
Date Wed, 04 Apr 2018 14:06:00 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-7072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16425564#comment-16425564

Wilfred Spiegelenburg commented on MAPREDUCE-7072:

 The root cause of the issue is located in the {{AbstractCounters}} code {{getGroupNames()}}

When you track through the code in the debugger the number of counter groups returned is higher
than expected. This is due to the fact that we add the deprecated counters names to the list
of counter group names before we return. The display name of the counters that are tracked
in the deprecated list, stored in the legacyMap, are the same as the display names in the
non-deprecated counters. The deprecated counters added are already in the non deprecated list
which causes the duplication.
It works in the JSON format because it internally uses a HashMap. The HashMap uses the name
of the counter groups as the key. The keys clash and we thus overwrite the existing value
with the value from the deprecated value.

To track where this issue is coming from: MAPREDUCE-4053 changed the iteration to work for
oozie and seems related to OOZIE-777 and the HadoopELFunctions which still seems to use the
deprecated counter name.
Changing what the method returns is thus not possible without breaking oozie. We can use the
iterator that can be returned by the abstract counters as it does not include the deprecated

> mapred job -history prints duplicate counter in human output
> ------------------------------------------------------------
>                 Key: MAPREDUCE-7072
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7072
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 3.0.0
>            Reporter: Wilfred Spiegelenburg
>            Assignee: Wilfred Spiegelenburg
>            Priority: Major
>  'mapred job -history' command prints duplicate entries for counters only for the human
output format. It does not do this for the JSON format.
> mapred job -history /user/history/somefile.jhist -format human
> {code}
> ....
> |Job Counters |Total megabyte-seconds taken by all map tasks|0 |0 |268,288,000
> ...
> |Job Counters |Total megabyte-seconds taken by all map tasks|0 |0 |268,288,000
> ....
> {code}

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: mapreduce-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-help@hadoop.apache.org

View raw message