flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Simone Robutti <simone.robu...@radicalbit.io>
Subject Re: SVM classification problem.
Date Sun, 02 Oct 2016 13:06:32 GMT
No, you don't get 100% accurracy in this case. You don't even want that, it
would be a severe case of overfitting. You would have that only in the case
that your dataset is linearly separable or separable with a finely tuned
kernel, but in that case SVM would be an overkill and more traditional
methodologies would suffice.

Flink SVM's implementation for binary classification returns "-1" as
default label for the "negative" class. It's a rather raw implementation so
it's better to use it exclusively if you have a clear idea of the
underlying process, otherwise you could have problems if you treat it as a
black box like you would do with more mature ML libraries.

2016-09-30 22:52 GMT+02:00 Kürşat Kurt <kursat@kursatkurt.com>:

> Hi;
>
>
>
> I am trying to train and predict with the same set. I expect that accuracy
> shuld be %100, am i wrong?
>
> If i try to predict with the same set; it is failing, also it classifies
> like “-1” which is not in the training set.
>
> What is wrong with this code?
>
>
>
> *Code:*
>
> *def* main(args: Array[String]): Unit = {
>
>     *val* env = ExecutionEnvironment.getExecutionEnvironment
>
>     *val* training = Seq(
>
>       *new* LabeledVector(1.0, *new* SparseVector(10, Array(0, 2, 3),
> Array(1.0, 1.0, 1.0))),
>
>       *new* LabeledVector(1.0, *new* SparseVector(10, Array(0, 1, 5, 9),
> Array(1.0, 1.0, 1.0, 1.0))),
>
>       *new* LabeledVector(0.0, *new* SparseVector(10, Array(0, 2), Array(
> 0.0, 1.0))),
>
>       *new* LabeledVector(0.0, *new* SparseVector(10, Array(0), Array(0.0
> ))),
>
>       *new* LabeledVector(0.0, *new* SparseVector(10, Array(0, 2), Array(
> 0.0, 1.0))),
>
>       *new* LabeledVector(0.0, *new* SparseVector(10, Array(0), Array(0.0
> ))))
>
>
>
>     *val* trainingDS = env.fromCollection(training)
>
>     *val* testingDS = env.fromCollection(training)
>
>     *val* svm = *new* SVM().setBlocks(env.getParallelism)
>
>     svm.fit(trainingDS)
>
>     *val* predictions = svm.evaluate(testingDS.map(x => (x.vector, x.label
> )))
>
>     predictions.print()
>
>
>
>   }
>
>
>
> *Output:*
>
> (1.0,1.0)
>
> (1.0,1.0)
>
> (0.0,1.0)
>
> (0.0,-1.0)
>
> (0.0,1.0)
>
> (0.0,-1.0)
>

Mime
View raw message