hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Arun C Murthy (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-3366) Shuffle/Merge improvements
Date Tue, 13 May 2008 08:04:57 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-3366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12596315#action_12596315
] 

Arun C Murthy commented on HADOOP-3366:
---------------------------------------

Synopsis:

Running through the SequenceFile.Sorter.merge with a fine-toothed comb and turning the profiler
on it yielded interesting results.
Telling - a reasonably large job we profiled had this characteristic for a reduce which started
_after_ all maps had completed:
shuffle: 13mins
merge: 17mins
reduce: 15mins
Note: merge was also active _while_ shuffle was happening...

So folks get the picture...

----

Prognosis:

1. Epilogue: HADOOP-3365, HADOOP-2095 etc.
2. We really need to tighten the merge code, eliminate copies etc. HADOOP-2919 did it for
the sort, we need something similar for the merge.

----

Radio-therapy:
1. Eliminate the usage of SequenceFiles completely for intermediate sort/merge. We just need
to write (key-length, key, value-length, value)* to a compressed stream. We do not need any
of the features provided by the SequenceFile i.e. header, sync etc.
2. Currently the map-side sort writes out index, index.crc, data and data.crc files. This
costs 4 seeks per map-reduce pair which is 4*300,000*10,000 assuming a large job with 300k
maps and 10k reduces (slightly futuristic). We could do much better by putting the crc at
the end of the data file, and crc for each record in the index, cuts down seeks by 50%. Potentially
we could keep the index in-memory at the TaskTracker for currently running jobs, a future
optimization.
3. At the reducer, decompress the (key-length, key, value-length, value)*, check crc (flag
error if necessary) and keep it.
4. Throw away RamFS, implement a simple manager who returns byte-arrays of a given size (i.e.
decompressed shuffle split) until it runs out of the amount of memory available.
5. Copy the shuffled data into the byte-array and merge with other byte-arrays, write merged
data to disk after compressing it.
6. Now use raw-comparators on the data in the byte-arrays for optimized compares.

This will be a reasonable first-step, measure more and optimize later.

----

Thoughts?

> Shuffle/Merge improvements
> --------------------------
>
>                 Key: HADOOP-3366
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3366
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
>             Fix For: 0.18.0
>
>
> This is intended to be a meta-issue to track various improvements to shuffle/merge in
the reducer.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message