hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-3235) Improve CPU cache behavior in map side sort
Date Thu, 20 Oct 2011 19:30:10 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-3235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13131934#comment-13131934
] 

Todd Lipcon commented on MAPREDUCE-3235:
----------------------------------------

bq. -1 for the second. The key distribution of terasort, which results in compare != 0 most
of the time, is an anomaly. In my experience, where skews etc are almost always a fact of
life, more compares return 0 than non-zero.

Definitely worth considering. Like Chris said, this comparison is practically free - we don't
have to delegate to any "proxy objects" as you put it. The proxies have to be 4-byte strings.
Since we already do a compare on the PARTITION part of the metadata, making that comparison
an 8-byte compare instead of a 4-byte compare doesn't really cost anything. So, the only real
cost is the extra accounting overhead in the buffer - 20 bytes per record instead of 16. One
optimization there would be to pack the PARTITION field more tightly - in most MR jobs, we
have <256 reducers, so partition could be a single byte. Since we know the number of reducers
up front, we could easily trade-off space between the partition ID and the comparison proxy.

bq. What's the API look like?

Currently I added a marker interface with a single method: getPrefix(byte[] dst, int off,
int length). If the key type implements this interface, getPrefix() is called by collect()
to copy the comparison proxy into the kvmeta buffer. I was thinking last night that it would
be better to delegate to the Serializer implementation there, though. I just did the above
for expediency last night while hacking this together. Alternatively, it might make sense
to add another interface like how RawComparator is done.

bq. Both of these would be pretty esoteric config knobs

I imagine most of the "stock writables" we have in Hadoop could easily implement this - eg
Text, BytesWritable, LongWritable, etc. Frameworks like Pig/Hive could get it in as well.
Application programmers implementing their own key types already tend to implement RawComparators,
so allowing them to implement another simple method for a good CPU boost doesn't seem too
bad.
                
> Improve CPU cache behavior in map side sort
> -------------------------------------------
>
>                 Key: MAPREDUCE-3235
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3235
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: task
>    Affects Versions: 0.23.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>
> When running oprofile on a terasort workload, I noticed that a large amount of CPU usage
was going to MapTask$MapOutputBuffer.compare. Upon disassembling this and looking at cycle
counters, most of the cycles were going to memory loads dereferencing into the array of key-value
data -- implying expensive cache misses. This can be avoided as follows:
> - rather than simply swapping indexes into the kv array, swap the entire meta entries
in the meta array. Swapping 16 bytes is only negligibly slower than swapping 4 bytes. This
requires adding the value-length into the meta array, since we used to rely on the previous-in-the-array
meta entry to determine this. So we replace INDEX with VALUELEN and avoid one layer of indirection.
> - introduce an interface which allows key types to provide a 4-byte comparison proxy.
For string keys, this can simply be the first 4 bytes of the string. The idea is that, if
stringCompare(key1.proxy(), key2.proxy()) != 0, then compare(key1, key2) should have the same
result. If the proxies are equal, the normal comparison method is used. We then include the
4-byte proxy as part of the metadata entry, so that for many cases the indirection into the
data buffer can be avoided.
> On a terasort benchmark, these optimizations plus an optimization to WritableComparator.compareBytes
dropped the aggregate mapside CPU millis by 40%, and the compare() routine mostly dropped
off the oprofile results.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message