hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-64) Map-side sort is hampered by io.sort.record.percent
Date Fri, 16 Oct 2009 23:17:31 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-64?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12766770#action_12766770
] 

Todd Lipcon commented on MAPREDUCE-64:
--------------------------------------

Hi Chris,

I started looking over this yesterday. I haven't had a chance to go fully through it, given
how dense the code is, but here are a few of my initial thoughts:

- In the configuration parameter, there's the text "Note that this does not imply any chunking
of data to the spill." - I'm not sure what this is trying to say. I see that you copied it
from the old description, but don't think it's very clear - if I don't understand it even
with code level familiarity I doubt end users will.
- Consider writing TestMapCollection as a true unit test rather than submitting jobs to the
local cluster; Mockito can probably be helpful here. Or, separate out MapOutputBuffer into
a static class (maybe even a new file?) so it's easier to test in isolation without mocking
up as much stuff. I tend to prefer static inner classes (or non-inner classes) since their
scope is more clearly identifiable, but I understand that's a personal preference.
- If you do want to submit jobs to the local runner, set Job.COMPLETION_POLL_INTERVAL_KEY
(mapreduce.client.completion.pollinterval) to 50ms or so instead of the default. This makes
the test 4x faster on my system. You'll have to set this in a Configuration passed into new
Job() in the tests, since it uses the cluster config, not the job config (I think this is
a bug and I'll open a separate JIRA)
- suggested test: would be good to have a RecordFactory that generates records of randomized
length to truly stress test the system. If we can figure out a test that generates random
lengths but has output that's still verifiably correct (in terms of contents and not just
total record count) that would be ideal. One possibility is to calculate the product of the
checksums of the individual records while generating, and verify that the product remains
the same in the reducer. This would guard against a bug like a duplicated record output. If
we had this as a separate (not run by default) stress/fuzz test which takes 20 minutes to
run and throws all kinds of random data through the system, I think that would be fine.

For the purpose of continued reviewing (and future documentation) I'd love to see some kind
of diagram of the various (I count 9!) pointers into the data buffer. Do you think you could
draw something up, or at least write up a sort of mini design doc either in this JIRA or in
the code? Just a table of which points delineate what buffers, and what direction they grow
in, would be great.


> Map-side sort is hampered by io.sort.record.percent
> ---------------------------------------------------
>
>                 Key: MAPREDUCE-64
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-64
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>            Reporter: Arun C Murthy
>            Assignee: Chris Douglas
>         Attachments: M64-0.patch, M64-1.patch, M64-2.patch
>
>
> Currently io.sort.record.percent is a fairly obscure, per-job configurable, expert-level
parameter which controls how much accounting space is available for records in the map-side
sort buffer (io.sort.mb). Typically values for io.sort.mb (100) and io.sort.record.percent
(0.05) imply that we can store ~350,000 records in the buffer before necessitating a sort/combine/spill.
> However for many applications which deal with small records e.g. the world-famous wordcount
and it's family this implies we can only use 5-10% of io.sort.mb i.e. (5-10M) before we spill
inspite of having _much_ more memory available in the sort-buffer. The word-count for e.g.
results in ~12 spills (given hdfs block size of 64M). The presence of a combiner exacerbates
the problem by piling serialization/deserialization of records too...
> Sure, jobs can configure io.sort.record.percent, but it's tedious and obscure; we really
can do better by getting the framework to automagically pick it by using all available memory
(upto io.sort.mb) for either the data or accounting.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message