mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Som Satpathy <somsatpa...@gmail.com>
Subject Re: Query regarding Mahout's distributed Random Forest implementation
Date Tue, 14 Jan 2014 03:35:47 GMT
I got what I was looking for -

https://issues.apache.org/jira/browse/MAHOUT-835
Thanks,
Som


On Thu, Jan 9, 2014 at 8:43 AM, Som Satpathy <somsatpathy@gmail.com> wrote:

> Hi all,
>
> In Mahout 0.8, the distributed Random Forest implementation doesn't seem
> to be computing the out of bag error while building the RF model. I wanted
> to confirm if that is really the case.
>
> While browsing through previous versions of Mahout source code (versions
> 0.2 to 0.5), I came across distributed code to compute the out of bag error
> while building the RF model - the classes of note here are Step2Job and
> Step2Mapper, both of these don't exist in 0.8. Also, I don't see the
> 'callback' package any more in 0.8. I was wondering why Mahout doesn't
> support those implementations any more.
>
> I'm currently using the Mahout's RF PartialBuilder and am working on ways
> to evaluate the model built. I wanted to know the best strategy to evaluate
> RF models built via PartialBuilder. I can always split my data into train
> and test samples and get metrics like AUC. What I was thinking is, if
> Mahout's distributed RF implementation involved computation of out of bag
> error while generating the model, then there is no need to split my offline
> data into train and test samples.
>
> Looking forward to hearing your thoughts on this.
>
> Thanks,
> Som
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message