spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Apache Spark (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-27621) Calling transform() method on a LinearRegressionModel throws NoSuchElementException
Date Thu, 02 May 2019 09:24:00 GMT

    [ https://issues.apache.org/jira/browse/SPARK-27621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16831501#comment-16831501
] 

Apache Spark commented on SPARK-27621:
--------------------------------------

User 'ancasarb' has created a pull request for this issue:
https://github.com/apache/spark/pull/24509

> Calling transform() method on a LinearRegressionModel throws NoSuchElementException
> -----------------------------------------------------------------------------------
>
>                 Key: SPARK-27621
>                 URL: https://issues.apache.org/jira/browse/SPARK-27621
>             Project: Spark
>          Issue Type: Bug
>          Components: ML
>    Affects Versions: 2.3.0, 2.3.1, 2.3.2, 2.3.3, 2.3.4, 2.4.0, 2.4.1, 2.4.2
>            Reporter: Anca Sarb
>            Priority: Minor
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> When transform(...) method is called on a LinearRegressionModel created directly with
the coefficients and intercepts, the following exception is encountered.
> {code:java}
> java.util.NoSuchElementException: Failed to find a default value for loss at org.apache.spark.ml.param.Params$$anonfun$getOrDefault$2.apply(params.scala:780)
at org.apache.spark.ml.param.Params$$anonfun$getOrDefault$2.apply(params.scala:780) at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.ml.param.Params$class.getOrDefault(params.scala:779) at org.apache.spark.ml.PipelineStage.getOrDefault(Pipeline.scala:42)
at org.apache.spark.ml.param.Params$class.$(params.scala:786) at org.apache.spark.ml.PipelineStage.$(Pipeline.scala:42)
at org.apache.spark.ml.regression.LinearRegressionParams$class.validateAndTransformSchema(LinearRegression.scala:111)
at org.apache.spark.ml.regression.LinearRegressionModel.validateAndTransformSchema(LinearRegression.scala:637)
at org.apache.spark.ml.PredictionModel.transformSchema(Predictor.scala:192) at org.apache.spark.ml.PipelineModel$$anonfun$transformSchema$5.apply(Pipeline.scala:311)
at org.apache.spark.ml.PipelineModel$$anonfun$transformSchema$5.apply(Pipeline.scala:311)
at scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:57) at scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:66)
at scala.collection.mutable.ArrayOps$ofRef.foldLeft(ArrayOps.scala:186) at org.apache.spark.ml.PipelineModel.transformSchema(Pipeline.scala:311)
at org.apache.spark.ml.PipelineStage.transformSchema(Pipeline.scala:74) at org.apache.spark.ml.PipelineModel.transform(Pipeline.scala:305)
> {code}
> This is because validateAndTransformSchema() is called both during training and scoring
phases, but the checks against the training related params like loss should really be performed
during training phase only, I think, please correct me if I'm missing anything.
> This issue was first reported for mleap ([combust/mleap#455|https://github.com/combust/mleap/issues/455])
because basically when we serialize the Spark transformers for mleap, we only serialize the
params that are relevant for scoring. We do have the option to de-serialize the serialized
transformers back into Spark for scoring again, but in that case, we no longer have all the
training params.
> Test to reproduce in PR: [https://github.com/apache/spark/pull/24509]
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message