mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nowal, Akshay" <Akshay_No...@SYNTELINC.COM>
Subject RE: Difference when we don't use partial implementation
Date Thu, 05 Jul 2012 06:03:21 GMT
Hey thanks for quick response.

If I am understanding u properly, " every tree grown is trained on the whole dataset" means
that all the features/variables are used for building the trees where as in partial we take
a subset of the features/variables ??
Kindly correct me if I m wrong

Thanks again

Regards,
Akshay Nowal

 |       

-----Original Message-----
From: deneche abdelhakim [mailto:adeneche@gmail.com] 
Sent: Thursday, July 05, 2012 11:23 AM
To: user@mahout.apache.org
Subject: Re: Difference when we don't use partial implementation

Hi Akshay,

when you don't use the "-p" parameter, the builder loads the whole dataset
in memory in every computing node, so every tree grown is trained on the
whole dataset (of course using bagging to select a subset of it). When
using "-p", every computing node loads a part of the dataset (thus the name
"partial") so the trees are trained on parts of the dataset. The training
algorithm is the same in both implementations, and the partial
implementation is used when the dataset is too big to fit in memory.

On Thu, Jul 5, 2012 at 4:38 AM, Nowal, Akshay <Akshay_Nowal@syntelinc.com>wrote:

> Hi All,
>
>
>
> I am running Decision forest in Mahout, below are the commands that I
> have used to implement the algo:
>
>
>
> Info file:
>
> mahout org.apache.mahout.df.tools.Describe -p
> /user/an32665/KDD/KDDTrain+.arff -f /user/an32665/KDD/KDDTrain+.info -d
> N 3 C 2 N C 4 N C 8 N 2 C 19 N L
>
> Building Forest:
>
> mahout org.apache.mahout.df.mapreduce.BuildForest
> -Dmapred.max.split.size=1874231 -oob -d /user/an32665/KDD/KDDTrain+.arff
> -ds /user/an32665/KDD/KDDTrain+.info -sl 5 -p -t 100 -o nsl-forest
>
> Testing Forest:
>
> mahout org.apache.mahout.df.mapreduce.TestForest -i
> /user/an32665/KDD/KDDTest+.arff -ds /user/an32665/KDD/KDDTrain+.info -m
> nsl-forest -a -mr -o predictions
>
>
>
> So while building the forest we use "-P" for implementing partial
> implementation. I just wanted to know the difference in algorithm when
> we use "-p" and when we don't use "-p".
>
>
>
>
>
> Regards,
>
> Akshay Nowal
>
>
>
>

Mime
View raw message