hadoop-mapreduce-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jian yi <eyj...@gmail.com>
Subject Re: MBR model diagram (Map-Balance-Reduce)
Date Sun, 07 Feb 2010 04:39:42 GMT
If a split is bigger than a specified size, it will be splitted again. If a
split is smaller than a specified size, it will be combined with others, we
can name the combining procedure regroup. The combining is logic, it's not
necessay to combine these smaller splits to a disk file, which will not
affect the performance.The target is that every task spent same time
[image: mbr_detailedx.JPG]

2010/2/6 jian yi <eyjian@gmail.com>

> In MR (Map-Reduce) model, reducings are not balanced, because the scale of
> partitiones are unbalanced. How to balance? We can control the size of
> partition, rehash the bigger parition and combine to the specified size. If
> a key has many values, it's necessary to execute mapreduce twice.The
> following is the model digram:
> [image: Map-Balance-Reduce.JPG]
> Scheduler can regard a task as a timeslice similarly OS scheduler.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message