hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gopal V (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-3235) Improve CPU cache behavior in map side sort
Date Wed, 07 Nov 2012 23:40:14 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-3235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13492822#comment-13492822

Gopal V commented on MAPREDUCE-3235:

Yes, approximately 90M.

The patch I've put up does not break BC yet. Because none of the regular WritableComparators
have been modified.

This patch could be useful for a large common subset of cases (i.e ordering is natural byte
ordinal) but might be detrimental to a few corner cases. Not a breaking change when I put
it in a custom Comparator instead of being built-in - the breakage is only if Text.Comparator
implemented the feature. 

BinaryComparable does export getPrefix(), but MapTask does not check for it & only serves
as a short-cut to data in the custom Comparator class.

I'm running the following as my primary test


the 90M version failed to register any CPU numbers (running LocalJobRunner) - the prefix sorted
version did show less GC time spent over-all. 

Ran with 5x the amount of data to see if it triggered any number differences

|| - || trunk || +hashcomparator || %change ||
|  time |  204992 |  193958 | -5.3% |
| compares | 1382152895 | 410352373 | -70% |

But just comparing the output from the time command

trunk = 207.70user 9.29system 3:26.76elapsed 104%CPU 
patched = 193.89user 9.37system 3:15.68elapsed 103%CPU

The performance gain is more to do with avoiding the interface dispatch involved in calling
comparator. The best I could tell any interface dispatch in JVM is a bit more expensive than
direct class calls.

The kvmeta being copied 16 bytes at a time is actually slower during sort, but my data set
is perhaps skewed towards long sorted subsequences (i.e 500 of them already ordered), causing
more swaps per compare than something more real-life.
> Improve CPU cache behavior in map side sort
> -------------------------------------------
>                 Key: MAPREDUCE-3235
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3235
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: performance, task
>    Affects Versions: 0.23.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>         Attachments: hashed-sort-MAPREDUCE-3235.patch, map_sort_perf.diff, mr-3235-poc.txt
> When running oprofile on a terasort workload, I noticed that a large amount of CPU usage
was going to MapTask$MapOutputBuffer.compare. Upon disassembling this and looking at cycle
counters, most of the cycles were going to memory loads dereferencing into the array of key-value
data -- implying expensive cache misses. This can be avoided as follows:
> - rather than simply swapping indexes into the kv array, swap the entire meta entries
in the meta array. Swapping 16 bytes is only negligibly slower than swapping 4 bytes. This
requires adding the value-length into the meta array, since we used to rely on the previous-in-the-array
meta entry to determine this. So we replace INDEX with VALUELEN and avoid one layer of indirection.
> - introduce an interface which allows key types to provide a 4-byte comparison proxy.
For string keys, this can simply be the first 4 bytes of the string. The idea is that, if
stringCompare(key1.proxy(), key2.proxy()) != 0, then compare(key1, key2) should have the same
result. If the proxies are equal, the normal comparison method is used. We then include the
4-byte proxy as part of the metadata entry, so that for many cases the indirection into the
data buffer can be avoided.
> On a terasort benchmark, these optimizations plus an optimization to WritableComparator.compareBytes
dropped the aggregate mapside CPU millis by 40%, and the compare() routine mostly dropped
off the oprofile results.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message