hadoop-mapreduce-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Fusheng Han <fsh....@gmail.com>
Subject Re: Images is not visible of gmail?//Map-Balance-Reduce draft
Date Mon, 08 Feb 2010 14:51:59 GMT
Hi, jian

No attachments can be seen in gmail.

On Mon, Feb 8, 2010 at 5:23 PM, jian yi <eyjian@gmail.com> wrote:
> Two targets:
> 1. Solving the skew problem
> 2. Regarding a task as a timeslice to improve on scheduler, switching a job
> to another job by timeslice.
> In MR (Map-Reduce) model, reducings are not balanced, because the scale of
> partitiones are unbalanced. How to balance? We can control the size of
> partition, rehash the bigger parition and combine to the specified size. If
> a key has many values, it's necessary to execute mapreduce twice.The
> following is the model digram:
> mbr1.jpg (attachment)
> Scheduler can regard a task as a timeslice similarly OS scheduler.
> If a split is bigger than a specified size, it will be splitted again. If a
> split is smaller than a specified size, it will be combined with others, we
> can name the combining procedure regroup. The combining is logic, it's not
> necessay to combine these smaller splits to a disk file, which will not
> affect the performance.The target is that every task spent same time
> running.
> mbr2.jpg (attachment)

View raw message