mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <>
Subject Re: Some ideas for Mahout 0.5
Date Mon, 04 Oct 2010 18:52:27 GMT
On Mon, Oct 4, 2010 at 11:44 AM, deneche abdelhakim <>wrote:

> For Decision Forests, my goal for 0.5 is to add a 'full'
> implementation. Meaning, an implementation that can build random
> forests using the whole dataset, even if its split among many
> machines. I found the following paper to be very interesting:
> although the described approach doesn't work as it is for numerical
> attributes.

Very cool.

I would love it if DF became a first class Mahout classifier.

As well as scaling up, it would be very nice if there were a model
compression step to help with the deployment of DF

> The implementation should at least work for the following dataset:
> it's 50 GB, and a small subset is available in UCI. It contains only
> categorical attributes, and it's big enough to be a good candidate.

Which UCI dataset is this?  The income>50k$ one?

Does the AWS dataset have household income?

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message