hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Sammer <e...@lifeless.net>
Subject Re: Should mapreduce.ReduceContext reuse same object in nextKeyValue?
Date Wed, 13 Jan 2010 17:59:34 GMT
On 1/13/10 12:29 PM, Ed Mazur wrote:
> What is the preferred method of avoiding value buffering? For example,
> if you're building a basic inverted index, you have one key (term)
> associated with many values (doc ids) in your reducer. If you want an
> output pair of something like <Text, IntArrayWritable>, is there a way
> to build and output the id array without buffering values? The only
> alternative I see is to instead use <Text, IntWritable> and repeat the
> term for every doc id, but this seems wasteful.
> Ed


In that case, I think you would want to buffer the values. I should
probably correct myself and say that it depends on the application. In
general, the assumption made by the framework is that all reduce values
 for a given key may not fit in memory. In specific implementations it
may be fine (or even necessary) for the user to do buffering like this.

Thanks and sorry for the confusion.
Eric Sammer

View raw message