hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Douglas (JIRA)" <j...@apache.org>
Subject [jira] Updated: (MAPREDUCE-64) Map-side sort is hampered by io.sort.record.percent
Date Tue, 22 Dec 2009 09:26:30 GMT

     [ https://issues.apache.org/jira/browse/MAPREDUCE-64?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Chris Douglas updated MAPREDUCE-64:
-----------------------------------

    Attachment: M64-5.patch

Thank you for the detailed comments, Hong.

bq. The logic of calculating the equator seems to be missing a multipication of METASIZE
bq. In SpillThread: " if (bufend < bufindex && bufindex < bufstart)" should
probably be " if (bufend < bufstart) {"
bq. Buffer.write(byte[], int, int): "blockwrite = distkvi < distkve" should be "blockwrite
= distkvi <= distkve"

Great catches! Fixed.

bq. A potential inefficiency if we encounter a large record when there are few (but not zero)
records in the buffer - this would lead to these few records written out as a single spill.
A better way is to spill out the single large record, and continue accumulating records after
that.

This is an interesting idea. Clever implementations could also avoid skewing the average record
size disproportionately (possibly an independent issue). Please file a JIRA.

bq. TestMapCollection: uniform random is used [...] Suggest to change to a distribution that
gives more weight to small values

Soright. Modified the random testcase.

bq. Any particular reason to shut down the thread in Buffer.flush() rather than Buffer.close()?

Only history. The distinction between flush and close is not clear for a Collector, particularly
since one or the other is a noop for map-only/reducer'd jobs. Pulling the MapOutputBuffer
into a standalone class could help to refine the distinction. Work such as MAPREDUCE-1211
would clearly benefit; IIRC, the current version of that proposal also pulled out the collector.
Filed MAPREDUCE-1324 to track extracting the buffer from MapTask.

bq. I also have a couple of suggestions on refactoring the code to make it more readable [...]

These are all good suggestions. I thought of the index-based code as inferring high-level
abstractions from low-level state, but the {{spillExists}}, {{spillInProgress}} flags distill
a lot of esoteric, often redundant calculation into a more understandable format. There's
another missing abstraction for setting/querying metadata, which could replace the inline
kvmeta manipulations. Since the testing/validation of this patch is difficult, and you've
already done the work, I'd like to postpone this to a separate issue if that's OK.

bq. Other very minor nits [...]

Fixed all these.

Thank you again for so thorough a review.

> Map-side sort is hampered by io.sort.record.percent
> ---------------------------------------------------
>
>                 Key: MAPREDUCE-64
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-64
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>            Reporter: Arun C Murthy
>            Assignee: Chris Douglas
>         Attachments: M64-0.patch, M64-0i.png, M64-1.patch, M64-1i.png, M64-2.patch, M64-2i.png,
M64-3.patch, M64-4.patch, M64-5.patch
>
>
> Currently io.sort.record.percent is a fairly obscure, per-job configurable, expert-level
parameter which controls how much accounting space is available for records in the map-side
sort buffer (io.sort.mb). Typically values for io.sort.mb (100) and io.sort.record.percent
(0.05) imply that we can store ~350,000 records in the buffer before necessitating a sort/combine/spill.
> However for many applications which deal with small records e.g. the world-famous wordcount
and it's family this implies we can only use 5-10% of io.sort.mb i.e. (5-10M) before we spill
inspite of having _much_ more memory available in the sort-buffer. The word-count for e.g.
results in ~12 spills (given hdfs block size of 64M). The presence of a combiner exacerbates
the problem by piling serialization/deserialization of records too...
> Sure, jobs can configure io.sort.record.percent, but it's tedious and obscure; we really
can do better by getting the framework to automagically pick it by using all available memory
(upto io.sort.mb) for either the data or accounting.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message