hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Venner <ja...@attributor.com>
Subject Re: question on Hadoop configuration for non cpu intensive jobs - 0.15.1
Date Tue, 25 Dec 2007 22:52:38 GMT
My mapper in this case is the identity mapper, and the reducer gets 
about 10 values per key and makes a collect decision based on the data 
in the values.
The reducer is very close to a no-op, and uses very little additional 
memory than the values.

I believe the problem is in the amount of buffering in the output files.

The quandary we have is the jobs run very poorly with the standard input 
split size as the mean time to finishing a split is very small, vrs 
gigantic memory requirements for large split sizes.

Time to play with parameters again ... since the answer doesn't appear 
to be in working memory for the list.

Ted Dunning wrote:
> What are your mappers doing that they run out of memory?  Or is it your
> reducers?
> Often, you can write this sort of program so that you don't have higher
> memory requirements for larger splits.
> On 12/25/07 1:52 PM, "Jason Venner" <jason@attributor.com> wrote:
>> We have tried reducing the number of splits by increasing the block
>> sizes to 10x and 5x 64meg, but then we constantly have out of memory
>> errors and timeouts. At this point each jvm is getting 768M and I can't
>> readily allocate more without dipping into swap.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message