flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Till Rohrmann <trohrm...@apache.org>
Subject Re: MultipleLinearRegression - Strange results
Date Mon, 01 Jun 2015 10:33:51 GMT
Since MLR uses stochastic gradient descent, you probably have to configure
the step size right. SGD is very sensitive to the right step size choice.
If the step size is too high, then the SGD algorithm does not converge. You
can find the parameter description here [1].

Cheers,
Till

[1]
http://ci.apache.org/projects/flink/flink-docs-master/libs/ml/multiple_linear_regression.html

On Mon, Jun 1, 2015 at 11:48 AM, Felix Neutatz <neutatz@googlemail.com>
wrote:

> Hi,
>
> I want to use MultipleLinearRegression, but I got really strange results.
> So I tested it with the housing price dataset:
>
> http://archive.ics.uci.edu/ml/machine-learning-databases/housing/housing.data
>
> And here I get negative house prices - even when I use the training set as
> dataset:
> LabeledVector(-1.1901998613214253E78, DenseVector(1500.0, 2197.0, 2978.0,
> 1369.0, 1451.0))
> LabeledVector(-2.7411218018254747E78, DenseVector(4445.0, 4522.0, 4038.0,
> 4223.0, 4868.0))
> LabeledVector(-2.688526857613956E78, DenseVector(4522.0, 4038.0, 4351.0,
> 4129.0, 4617.0))
> LabeledVector(-1.3075960386971714E78, DenseVector(2001.0, 2059.0, 1992.0,
> 2008.0, 2504.0))
> LabeledVector(-1.476238770814297E78, DenseVector(1992.0, 1965.0, 1983.0,
> 2300.0, 3811.0))
> LabeledVector(-1.4298128754759792E78, DenseVector(2059.0, 1992.0, 1965.0,
> 2425.0, 3178.0))
> ...
>
> and a huge squared error:
> Squared error: 4.799184832395361E159
>
> You can find my code here:
>
> https://github.com/FelixNeutatz/wikiTrends/blob/master/extraction/src/test/io/sanfran/wikiTrends/extraction/flink/Regression.scala
>
> Can you help me? What did I do wrong?
>
> Thank you for your help,
> Felix
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message