mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bhattacharjee, Rohan" <robhattac...@ebay.com>
Subject RE: Classification Algorithms in Mahout
Date Wed, 10 Apr 2013 18:34:37 GMT
Doesn't the "random" part of random forest defend against overfitting ?


-----Original Message-----
From: ey-chih chow [mailto:eychih@gmail.com] 
Sent: Saturday, April 06, 2013 5:45 PM
To: user@mahout.apache.org
Subject: Re: Classification Algorithms in Mahout

I actually got a lot of over fitting.  The parameter that I can adjust is minSplitNum.  Is
there any other parameters that I can adjust to avoid over fitting.  Thanks.

Ey-Chih


On Wed, Mar 27, 2013 at 3:12 PM, Andy Twigg <andy.twigg@gmail.com> wrote:

> Dear Ey-Chih,
>
> What are your use cases for a better random forest?
>
> On 27 March 2013 11:59, Yutaka Mandai <20525entradero@gmail.com> wrote:
> > My understanding of current Random Forrest has a certain level of
> improvement  for running on Hadoop cluster from data splitting 
> alignment perspective for better balanced CPU utilization.
> > Regards,,,
> > Y.Mandai
> >
> > iPhoneから送信
> >
> > On 2013/03/25, at 14:48, Ted Dunning <ted.dunning@gmail.com> wrote:
> >
> >> I think that there are some others who could say more.
> >>
> >> On Mon, Mar 25, 2013 at 6:01 AM, Ey-Chih chow <eychih@gmail.com> wrote:
> >>
> >>> On Mar 24, 2013, at 1:00 AM, Ted Dunning wrote:
> >>>
> >>>> - random forest, sequential and parallel implementations, new 
> >>>> versions
> >>> are being developed, the current version may or may not be useful 
> >>> to
> you.
> >>>>
> >>> Can you elaborate the usefulness of the current version and 
> >>> features of the new versions?  Thanks.
> >>>
> >>> Ey-Chih Chow
> >>>
> >>>
> >>> On Mar 24, 2013, at 1:00 AM, Ted Dunning wrote:
> >>>
> >>>> You are correct to suspect that this page is substantially out of
> date.
> >>>>
> >>>> Currently, Mahout has the following classifiers:
> >>>>
> >>>> - stochastic gradient descent for logistic regression (SGD) with 
> >>>> L_1
> or
> >>> L_2 regularization, sequential version only.  These classifiers 
> >>> can be easily extended with other gradients and regularizers which 
> >>> should make linear SVM's easy to implement.
> >>>>
> >>>> - naive bayes, sequential and parallel implementations
> >>>>
> >>>> - random forest, sequential and parallel implementations, new 
> >>>> versions
> >>> are being developed, the current version may or may not be useful 
> >>> to
> you.
> >>>>
> >>>> There are a variety of other classifiers which are in various 
> >>>> states
> of
> >>> utility.
> >>>>
> >>>> On Mar 24, 2013, at 4:07 AM, Chidananda Sridhar wrote:
> >>>>
> >>>>> Hi,
> >>>>>
> >>>>> I am doing a class project on classification and want to use Mahout.
> I
> >>> was
> >>>>> searching for the classification algorithms already implemented

> >>>>> in
> >>> Mahout
> >>>>> and came to this page:
> >>>>> https://cwiki.apache.org/confluence/display/MAHOUT/Algorithms
> >>>>>
> >>>>> The webpage says that Online Passive Aggressive<
> >>>
> https://cwiki.apache.org/confluence/display/MAHOUT/Online+Passive+Aggr
> essive
> >>>> is
> >>>>> integrated and the rest of the classification algorithms are 
> >>>>> open or awaiting commit. Does the webpage have the latest 
> >>>>> information, or is
> it
> >>> yet
> >>>>> to be updated? Is "Online Passive Aggressive" the only algorithm

> >>>>> I
> can
> >>> use
> >>>>> for now? On the other hand, I see that most of the clustering
> algorithms
> >>>>> have been integrated.
> >>>>>
> >>>>> Thanks,
> >>>>> Chidananda
> >>>>
> >>>
> >>>
>
>
>
> --
> Dr Andy Twigg
> Junior Research Fellow, St Johns College, Oxford Room 351, Department 
> of Computer Science http://www.cs.ox.ac.uk/people/andy.twigg/
> andy.twigg@cs.ox.ac.uk | +447799647538
>

Mime
View raw message