hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sami Dalouche <sa...@hopper.com>
Subject Re: lazy-loading of Reduce's input
Date Mon, 03 Oct 2011 19:29:57 GMT
Just to make sure I was clear-enough :
- Is there a parameter that allows to set the size of the batch of elements
that are retrieved to memory while the reduce task iterates on the input
values ?

Thanks,
Sami dalouche

On Mon, Oct 3, 2011 at 1:42 PM, Sami Dalouche <samid@hopper.com> wrote:

> Hi,
>
> My understanding is that when the reduce() method is called, the values
> (Iterable<VALUEIN> values) are stored in memory.
>
> 1/ Is that actually true ?
> 2/ If this is true, is there a way to lazy-load the inputs to use less
> memory ? (e.g. load all the items by batches of 20, and discard the
> previously fetched ones)
> The only related option that I could find is mapreduce.reduce.input.limit,
> but it doesn't do what I need.
>
> The problem I am trying to solve is that my input values are huge objects
> (serialized lucene indices using a custom Writable implementation), and
> loading them all at once seems to require way too much memory.
>
> Thank You,
> Sami Dalouche
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message