mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jake Mannix <>
Subject Re: Re : Reg: Maximum Split size in Random Forest
Date Wed, 09 Jun 2010 17:48:40 GMT
On Tue, Jun 8, 2010 at 9:19 PM, deneche abdelhakim <>wrote:

> mapred.max.split.size controls how many partitions will be generated from
> the data.
> the current implementation of random forest is pretty memory intensive, and
> because all the work is done in the mappers' close method, when the data is
> Big, Hadoop just thinks that the mappers have failed (I will solve this
> problem some day).

By periodically hitting Reporter.progress() in the long-lived mapper, this
typically fixes this.


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message