spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xin Ren (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SPARK-15509) R MLlib algorithms should support input columns "features" and "label"
Date Mon, 30 May 2016 22:23:12 GMT

     [ https://issues.apache.org/jira/browse/SPARK-15509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Xin Ren updated SPARK-15509:
----------------------------
    Description: 
Currently in SparkR, when you load a LibSVM dataset using the sqlContext and then pass it
to an MLlib algorithm, the ML wrappers will fail since they will try to create a "features"
column, which conflicts with the existing "features" column from the LibSVM loader.  E.g.,
using the "mnist" dataset from LibSVM:

{code}
training <- loadDF(sqlContext, ".../mnist", "libsvm")
model <- naiveBayes(label ~ features, training)
{code}

This fails with:
{code}
16/05/24 11:52:41 ERROR RBackendHandler: fit on org.apache.spark.ml.r.NaiveBayesWrapper failed
Error in invokeJava(isStatic = TRUE, className, methodName, ...) : 
  java.lang.IllegalArgumentException: Output column features already exists.
	at org.apache.spark.ml.feature.VectorAssembler.transformSchema(VectorAssembler.scala:120)
	at org.apache.spark.ml.Pipeline$$anonfun$transformSchema$4.apply(Pipeline.scala:179)
	at org.apache.spark.ml.Pipeline$$anonfun$transformSchema$4.apply(Pipeline.scala:179)
	at scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:57)
	at scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:66)
	at scala.collection.mutable.ArrayOps$ofRef.foldLeft(ArrayOps.scala:186)
	at org.apache.spark.ml.Pipeline.transformSchema(Pipeline.scala:179)
	at org.apache.spark.ml.PipelineStage.transformSchema(Pipeline.scala:67)
	at org.apache.spark.ml.Pipeline.fit(Pipeline.scala:131)
	at org.apache.spark.ml.feature.RFormula.fit(RFormula.scala:169)
	at org.apache.spark.ml.r.NaiveBayesWrapper$.fit(NaiveBayesWrapper.scala:62)
	at org.apache.spark.ml.r.NaiveBayesWrapper.fit(NaiveBayesWrapper.sca
{code}

The same issue appears for the "label" column once you rename the "features" column.

  was:
Currently in SparkR, when you load a LibSVM dataset using the sqlContext and then pass it
to an MLlib algorithm, the ML wrappers will fail since they will try to create a "features"
column, which conflicts with the existing "features" column from the LibSVM loader.  E.g.,
using the "mnist" dataset from LibSVM:

{code}
training <- loadDF(sqlContext, ".../mnist", "libsvm")
model <- spark.naiveBayes(label ~ features, training)
{code}

This fails with:
{code}
16/05/24 11:52:41 ERROR RBackendHandler: fit on org.apache.spark.ml.r.NaiveBayesWrapper failed
Error in invokeJava(isStatic = TRUE, className, methodName, ...) : 
  java.lang.IllegalArgumentException: Output column features already exists.
	at org.apache.spark.ml.feature.VectorAssembler.transformSchema(VectorAssembler.scala:120)
	at org.apache.spark.ml.Pipeline$$anonfun$transformSchema$4.apply(Pipeline.scala:179)
	at org.apache.spark.ml.Pipeline$$anonfun$transformSchema$4.apply(Pipeline.scala:179)
	at scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:57)
	at scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:66)
	at scala.collection.mutable.ArrayOps$ofRef.foldLeft(ArrayOps.scala:186)
	at org.apache.spark.ml.Pipeline.transformSchema(Pipeline.scala:179)
	at org.apache.spark.ml.PipelineStage.transformSchema(Pipeline.scala:67)
	at org.apache.spark.ml.Pipeline.fit(Pipeline.scala:131)
	at org.apache.spark.ml.feature.RFormula.fit(RFormula.scala:169)
	at org.apache.spark.ml.r.NaiveBayesWrapper$.fit(NaiveBayesWrapper.scala:62)
	at org.apache.spark.ml.r.NaiveBayesWrapper.fit(NaiveBayesWrapper.sca
{code}

The same issue appears for the "label" column once you rename the "features" column.


> R MLlib algorithms should support input columns "features" and "label"
> ----------------------------------------------------------------------
>
>                 Key: SPARK-15509
>                 URL: https://issues.apache.org/jira/browse/SPARK-15509
>             Project: Spark
>          Issue Type: Improvement
>          Components: ML, SparkR
>            Reporter: Joseph K. Bradley
>
> Currently in SparkR, when you load a LibSVM dataset using the sqlContext and then pass
it to an MLlib algorithm, the ML wrappers will fail since they will try to create a "features"
column, which conflicts with the existing "features" column from the LibSVM loader.  E.g.,
using the "mnist" dataset from LibSVM:
> {code}
> training <- loadDF(sqlContext, ".../mnist", "libsvm")
> model <- naiveBayes(label ~ features, training)
> {code}
> This fails with:
> {code}
> 16/05/24 11:52:41 ERROR RBackendHandler: fit on org.apache.spark.ml.r.NaiveBayesWrapper
failed
> Error in invokeJava(isStatic = TRUE, className, methodName, ...) : 
>   java.lang.IllegalArgumentException: Output column features already exists.
> 	at org.apache.spark.ml.feature.VectorAssembler.transformSchema(VectorAssembler.scala:120)
> 	at org.apache.spark.ml.Pipeline$$anonfun$transformSchema$4.apply(Pipeline.scala:179)
> 	at org.apache.spark.ml.Pipeline$$anonfun$transformSchema$4.apply(Pipeline.scala:179)
> 	at scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:57)
> 	at scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:66)
> 	at scala.collection.mutable.ArrayOps$ofRef.foldLeft(ArrayOps.scala:186)
> 	at org.apache.spark.ml.Pipeline.transformSchema(Pipeline.scala:179)
> 	at org.apache.spark.ml.PipelineStage.transformSchema(Pipeline.scala:67)
> 	at org.apache.spark.ml.Pipeline.fit(Pipeline.scala:131)
> 	at org.apache.spark.ml.feature.RFormula.fit(RFormula.scala:169)
> 	at org.apache.spark.ml.r.NaiveBayesWrapper$.fit(NaiveBayesWrapper.scala:62)
> 	at org.apache.spark.ml.r.NaiveBayesWrapper.fit(NaiveBayesWrapper.sca
> {code}
> The same issue appears for the "label" column once you rename the "features" column.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message