hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Owen O'Malley" <omal...@apache.org>
Subject Re: output/input ratio > 1 for map tasks?
Date Mon, 30 Jul 2012 21:57:01 GMT
On Mon, Jul 30, 2012 at 11:47 AM, brisk <mylinqiao@gmail.com> wrote:

> Hi,
>
> Does anybody know if there are some cases where the output/input ratio for
> map tasks is larger than 1? I can just think of for the sort, it's 1 and
> for the search job it's usually smaller than 1...
>

The traditional case is building an inverted index of some sort. Your input
is the input documents, the shuffle is the set of search terms and their
targets and the output is the final index. The shuffle is much larger than
either the input or output.

-- Owen

Mime
View raw message