mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Suneel Marthi <suneel_mar...@yahoo.com>
Subject Re: Using "split" without partitioning the data to train/test
Date Mon, 31 Mar 2014 14:26:19 GMT


Sent from my iPhone

> On Mar 31, 2014, at 4:20 PM, Mahmood Naderan <nt_mahmood@yahoo.com> wrote:
> 
> Hi,
> In an old Mahout, I used wikipediaDataSetCreator on an input to create the training data
>     
>     mahout wikipediaDataSetCreator -i 
> wiki-tr/chunks -o tr-input -c labels.txt 
> 
> and then fed the tr-input to the trainclassifier using
> 
>     mahout trainclassifier -i tr-input -o wikimodel
> 
> 
> Now, in Mahout 0.9, I see some examples that create 80% of the input file as training
model using "split"
> 
>     mahout split -i input-vectors --trainingOutput tr-vectors --testOutput ts-vectors
--randomSelectionPct 20
> 
> My question is how can I use "split" to split the input without partitioning it to train
and test parts? I want to use one file as training input and the other file as the test input.

So why use 'split'?  Separate out the test and training files. 
> 
> 
>  
> Regards,
> Mahmood

Mime
View raw message