hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "nijel (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-6487) TaskCounter.MAP_OUTPUT_BYTES is 0 for map-only jobs
Date Sat, 10 Oct 2015 05:39:05 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-6487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14951623#comment-14951623

nijel commented on MAPREDUCE-6487:

thanks for the comments
Sorry for the delay.

the code is bit different from the mapOutputBuffer in which the bytes are calculated based
on the key and value serialize buffer.
In DirectMapOutputCollector the key and value is written into RecordWriter. 
 public void collect(K key, V value, int partition) throws IOException {
      long bytesOutPrev = getOutputBytes(fsStats);
      out.write(key, value);
      long bytesOutCurr = getOutputBytes(fsStats);
      fileOutputByteCounter.increment(bytesOutCurr - bytesOutPrev);

I am not getting how to get the bytes from RecordWriter. i am bit stuck on this.

Feel free to take up if you have a solution.

Test code to reproduce 
{code:title=TestJobCounters.testOldCounter() }

  public void testOldCounter() throws Exception {

    JobConf conf = new JobConf(TestOldJobCounters.class);

    conf.setBoolean("mapred.mapper.new-api", false);
    conf.setInt(JobContext.IO_SORT_FACTOR, 2);
    removeWordsFile(inFiles[4], conf);
    FileInputFormat.setInputPaths(conf, IN_DIR);
        .setOutputPath(conf, new Path(OUT_DIR, "output_oldcounter"));

    RunningJob myJob = JobClient.runJob(conf);
    Counters c1 = myJob.getCounters();
    Counter findCounter = c1.findCounter(TaskCounter.MAP_OUTPUT_BYTES);
    System.out.println("findCounter   :  " + findCounter);

    Assert.assertTrue("MAP_OUTPUT_BYTES is not populated",
        findCounter.getValue() > 0);

> TaskCounter.MAP_OUTPUT_BYTES is 0 for map-only jobs
> ---------------------------------------------------
>                 Key: MAPREDUCE-6487
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6487
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: task
>            Reporter: Laurent Goujon
>            Assignee: nijel
> It looks like DirectMapOutputCollector (used by Map-only jobs) doesn't update TaskCounter.MAP_OUTPUT_BYTES
although it updates TaskCounter.MAP_OUTPUT_RECORDS.

This message was sent by Atlassian JIRA

View raw message