spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "devl.development" <>
Subject LinearRegressionWithSGD accuracy
Date Thu, 15 Jan 2015 16:46:41 GMT
>From what I gather, you use LinearRegressionWithSGD to predict y or the
response variable given a feature vector x.

In a simple example I used a perfectly linear dataset such that x=y


Using the out-of-box example from the website (with and without scaling):

 val data = sc.textFile(file)

    val parsedData = { line =>
      val parts = line.split(',')
     LabeledPoint(parts(1).toDouble, Vectors.dense(parts(0).toDouble)) //y
and x

    val scaler = new StandardScaler(withMean = true, withStd = true)
      .fit( => x.features))
    val scaledData = parsedData
      .map(x =>

    // Building the model
    val numIterations = 100
    val model = LinearRegressionWithSGD.train(parsedData, numIterations)

    // Evaluate model on training examples and compute training error *
tried using both scaledData and parsedData
    val valuesAndPreds = { point =>
      val prediction = model.predict(point.features)
      (point.label, prediction)
    val MSE ={case(v, p) => math.pow((v - p), 2)}.mean()
    println("training Mean Squared Error = " + MSE)

Both scaled and unscaled attempts give:

training Mean Squared Error = NaN

I've even tried x, y+(sample noise from normal with mean 0 and stddev 1)
still comes up with the same thing.

Is this not supposed to work for x and y or 2 dimensional plots? Is there
something I'm missing or wrong in the code above? Or is there a limitation
in the method?

Thanks for any advice.

View this message in context:
Sent from the Apache Spark Developers List mailing list archive at

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message