mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Deneche A. Hakim (JIRA)" <>
Subject [jira] Commented: (MAHOUT-122) Random Forests Reference Implementation
Date Sun, 07 Jun 2009 09:24:07 GMT


Deneche A. Hakim commented on MAHOUT-122:

I've been reading Breiman's paper about Random Forests ([available here|]),
and in page 9 he says:

"Grow the tree using CART methodology to maximum size and do not prune."

So apparently he uses the CART algorithm to grow the trees, and if I'm not wrong, it differs
from the algorithm that I described int the wiki [].
The most important is the way it splits CATEGORICAL attributes:
* in the algorithm that I'm using a node is built for each value of the attribute
* in CART a best split value is found (in a similar way to NUMERICAL attributes) and only
two nodes are built given that the attribute's value is equal or not to the split value

I think that the best thing to do is to create an abstract DecisionTreeBuilder class, this
way we can use whatever implementation we want

> Random Forests Reference Implementation
> ---------------------------------------
>                 Key: MAHOUT-122
>                 URL:
>             Project: Mahout
>          Issue Type: Task
>          Components: Classification
>    Affects Versions: 0.2
>            Reporter: Deneche A. Hakim
>         Attachments: 2w_patch.diff, RF reference.patch
>   Original Estimate: 25h
>  Remaining Estimate: 25h
> This is the first step of my GSOC project. Implement a simple, easy to understand, reference
implementation of Random Forests (Building and Classification). The only requirement here
is that "it works"

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message