hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ravi Gummadi (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-2774) Add counters to show number of key/values that have been sorted and merged in the maps and reduces
Date Tue, 25 Nov 2008 04:44:47 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-2774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12650457#action_12650457
] 

Ravi Gummadi commented on HADOOP-2774:
--------------------------------------

The wordcount example works with mapred.job.shuffle.buffer.percent=0 also(No need of that
small +ve value) even without LocalJobRunner. So removing setting mapred.child.java.opts.
Since this testcase uses LocalJobRunner, removed the setting of mapred.job.shuffle.buffer.percent.

OK. Setting io.sort.record.percent and io.sort.spill.percent to the default values in the
testcase.

OK. Now the testcase removes the testdir(and not each file separately). Removes the testdir
even in the case of failure.

Just didn't want to change all the calls to constructors of IFile.Reader and IFile.Writer
in all files to have this extra parameter(null in most cases). So added new constructors.

OK. Added "{ }" for the if statements.

Map First Level Spills: Runping wanted this for
[ Show ยป ] Runping Qi - 27/Oct/08 11:32 AM it tells me how effective is the spill thread.
It also gives me some hint as to how to optimize my heapsite/sort.mb setting.

> Add counters to show number of key/values that have been sorted and merged in the maps
and reduces
> --------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-2774
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2774
>             Project: Hadoop Core
>          Issue Type: Bug
>            Reporter: Owen O'Malley
>            Assignee: Ravi Gummadi
>             Fix For: 0.20.0
>
>         Attachments: HADOOP-2774.patch, HADOOP-2774.patch, HADOOP-2774.patch, HADOOP-2774.patch
>
>
> For each *pass* of the sort and merge, I would like a count of the number of records.
So for example, if the map output 100 records and they were sorted once, the counter would
be 100. If it spilled twice and was merged together, it would be 200. Clearly in a multi-level
merge, it may not be a multiple of the number of map output records. This would let the users
easily see if they have values like io.sort.mb or io.sort.factor set too low.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message