mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: Some ideas for Mahout 0.5
Date Mon, 04 Oct 2010 18:52:27 GMT
On Mon, Oct 4, 2010 at 11:44 AM, deneche abdelhakim <adeneche@gmail.com>wrote:

> For Decision Forests, my goal for 0.5 is to add a 'full'
> implementation. Meaning, an implementation that can build random
> forests using the whole dataset, even if its split among many
> machines. I found the following paper to be very interesting:
> http://www.cba.ua.edu/~mhardin/rainforest.pdf
> although the described approach doesn't work as it is for numerical
> attributes.
>

Very cool.

I would love it if DF became a first class Mahout classifier.

As well as scaling up, it would be very nice if there were a model
compression step to help with the deployment of DF
models.



>
> The implementation should at least work for the following dataset:
>
> http://developer.amazonwebservices.com/connect/entry.jspa?externalID=2304&categoryID=248
> it's 50 GB, and a small subset is available in UCI. It contains only
> categorical attributes, and it's big enough to be a good candidate.
>

Which UCI dataset is this?  The income>50k$ one?

Does the AWS dataset have household income?

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message