mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <>
Subject Re: Out-of-core random forest implementation
Date Fri, 25 Jan 2013 21:52:07 GMT
Hey Andy,

There are no plans for this.  You are correct that multiple passes aren't
too difficult, but they do go against the standard map-reduce paradigm a
bit if you want to avoid iterative map-reduce.

It definitely would be nice to have a really competitive random forest
implementation that uses the global  accumulator style plus long-lived
mappers.  The basic idea would be to use the same sort of tricks that
Vowpal Wabbit or Giraph use to get a bunch of long-lived mappers and then
have them asynchronously talk to a tree repository.

On Fri, Jan 25, 2013 at 6:58 PM, Andy Twigg <> wrote:

> Hi,
> I'm new to this list so I apologise if this is covered elsewhere (but
> I couldn't find it..)
> I'm looking at the Random Forests implementations, both mapreduce
> ("partial") and non-distributed. Both appear to require the data
> loaded into memory. Random forests should be straightforward to
> construct with multiple passes through the data without storing the
> data in memory. Is there such an implementation in Mahout? If not, is
> there a ticket/plan ?
> Thanks,
> Andy
> --
> Dr Andy Twigg
> Junior Research Fellow, St Johns College, Oxford
> Room 351, Department of Computer Science
> | +447799647538

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message