mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "j.barrett Strausser" <>
Subject Data(Set) creation of for train and test.
Date Mon, 03 Feb 2014 21:01:50 GMT
Two part question.

1. String Descriptor for input data

Can anyone confirm my reasoning on the following -

I believe the below code does the following.  It says the first column is
the feature to be predicted (is a label) all other columns are to be used
in the tree construction e.g. as variable to split on.

val descriptor = "L N N"
val trainDataValues = fileAsStringArray("myTrainFile.csv");
val data = DataLoader.loadData(DataLoader.generateDataset(descriptor,
false, trainDataValues), trainDataValues);

Where my "myTrainFile.csv has a form like

"A", .45,.55
"B" 33.3, 22.3

2. String Descriptor for input data

I'm now provided a new file "myTestData.csv"

This data has no labels, but is otherwise the same as above. So if I
attempt to create a dataset an error will be thrown with complain of no

All I'm interested in is being able to call forest.classify(..., ...) but
I'm not sure how to correctly construct my training dataset.

I cannot simply split the original dataset as is done in most examples.

Any examples showing test data construction independent of the original
training set would be appreciated.


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message