spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Guillaume Dardelet (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (SPARK-12606) Scala/Java compatibility issue Re: how to extend java transformer from Scala UnaryTransformer ?
Date Tue, 04 Apr 2017 14:27:41 GMT

    [ https://issues.apache.org/jira/browse/SPARK-12606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15955191#comment-15955191
] 

Guillaume Dardelet edited comment on SPARK-12606 at 4/4/17 2:27 PM:
--------------------------------------------------------------------

I had the same issue in Scala and I solved it by overloading the constructor so that it initialises
the UID.

The error comes from the initialisation of the parameter "inputCol".
You get "null__inputCol" because when the parameter was initialised, your class didn't have
a uid.

Therefore, instead of

{code}
class Lemmatizer extends UnaryTransformer[String, String, Lemmatizer] {
  override val uid: String = Identifiable.randomUID("lemmatizer")
  protected def createTransformFunc: String) => String = ???
  protected def outputDataType: DataType = StringType
}
{code}

Do this:

{code}
class Lemmatizer(override val uid: String) extends UnaryTransformer[String, String, Lemmatizer]
{
  def this() = this( Identifiable.randomUID("lemmatizer") )
  protected def createTransformFunc: String) => String = ???
  protected def outputDataType: DataType = StringType
}
{code}


was (Author: panoramix):
I had the same issue in Scala and I solved it by overloading the constructor so that it initialises
the UID.

The error comes from the initialisation of the parameter "inputCol".
You get "null__inputCol" because when the parameter was initialised, your class didn't have
a uid.

Therefore, instead of

{code:scala}
class Lemmatizer extends UnaryTransformer[String, String, Lemmatizer] {
  override val uid: String = Identifiable.randomUID("lemmatizer")
  protected def createTransformFunc: String) => String = ???
  protected def outputDataType: DataType = StringType
}
{code}

Do this:

class Lemmatizer(override val uid: String) extends UnaryTransformer[String, String, Lemmatizer]
{
  def this() = this( Identifiable.randomUID("lemmatizer") )
  protected def createTransformFunc: String) => String = ???
  protected def outputDataType: DataType = StringType
}

> Scala/Java compatibility issue Re: how to extend java transformer from Scala UnaryTransformer
?
> -----------------------------------------------------------------------------------------------
>
>                 Key: SPARK-12606
>                 URL: https://issues.apache.org/jira/browse/SPARK-12606
>             Project: Spark
>          Issue Type: Bug
>          Components: ML
>    Affects Versions: 1.5.2
>         Environment: Java 8, Mac OS, Spark-1.5.2
>            Reporter: Andrew Davidson
>              Labels: transformers
>
> Hi Andy,
> I suspect that you hit the Scala/Java compatibility issue, I can also reproduce this
issue, so could you file a JIRA to track this issue?
> Yanbo
> 2016-01-02 3:38 GMT+08:00 Andy Davidson <Andy@santacruzintegration.com>:
> I am trying to write a trivial transformer I use use in my pipeline. I am using java
and spark 1.5.2. It was suggested that I use the Tokenize.scala class as an example. This
should be very easy how ever I do not understand Scala, I am having trouble debugging the
following exception.
> Any help would be greatly appreciated.
> Happy New Year
> Andy
> java.lang.IllegalArgumentException: requirement failed: Param null__inputCol does not
belong to Stemmer_2f3aa96d-7919-4eaa-ad54-f7c620b92d1c.
> 	at scala.Predef$.require(Predef.scala:233)
> 	at org.apache.spark.ml.param.Params$class.shouldOwn(params.scala:557)
> 	at org.apache.spark.ml.param.Params$class.set(params.scala:436)
> 	at org.apache.spark.ml.PipelineStage.set(Pipeline.scala:37)
> 	at org.apache.spark.ml.param.Params$class.set(params.scala:422)
> 	at org.apache.spark.ml.PipelineStage.set(Pipeline.scala:37)
> 	at org.apache.spark.ml.UnaryTransformer.setInputCol(Transformer.scala:83)
> 	at com.pws.xxx.ml.StemmerTest.test(StemmerTest.java:30)
> public class StemmerTest extends AbstractSparkTest {
>     @Test
>     public void test() {
>         Stemmer stemmer = new Stemmer()
>                                 .setInputCol("raw”) //line 30
>                                 .setOutputCol("filtered");
>     }
> }
> /**
>  * @ see spark-1.5.1/mllib/src/main/scala/org/apache/spark/ml/feature/Tokenizer.scala
>  * @ see https://chimpler.wordpress.com/2014/06/11/classifiying-documents-using-naive-bayes-on-apache-spark-mllib/
>  * @ see http://www.tonytruong.net/movie-rating-prediction-with-apache-spark-and-hortonworks/
>  * 
>  * @author andrewdavidson
>  *
>  */
> public class Stemmer extends UnaryTransformer<List<String>, List<String>,
Stemmer> implements Serializable{
>     static Logger logger = LoggerFactory.getLogger(Stemmer.class);
>     private static final long serialVersionUID = 1L;
>     private static final  ArrayType inputType = DataTypes.createArrayType(DataTypes.StringType,
true);
>     private final String uid = Stemmer.class.getSimpleName() + "_" + UUID.randomUUID().toString();
>     @Override
>     public String uid() {
>         return uid;
>     }
>     /*
>        override protected def validateInputType(inputType: DataType): Unit = {
>     require(inputType == StringType, s"Input type must be string type but got $inputType.")
>   }
>      */
>     @Override
>     public void validateInputType(DataType inputTypeArg) {
>         String msg = "inputType must be " + inputType.simpleString() + " but got " +
inputTypeArg.simpleString();
>         assert (inputType.equals(inputTypeArg)) : msg; 
>     }
>     
>     @Override
>     public Function1<List<String>, List<String>> createTransformFunc()
{
>         // http://stackoverflow.com/questions/6545066/using-scala-from-java-passing-functions-as-parameters
>         Function1<List<String>, List<String>> f = new AbstractFunction1<List<String>,
List<String>>() {
>             public List<String> apply(List<String> words) {
>                 for(String word : words) {
>                     logger.error("AEDWIP input word: {}", word);
>                 }
>                 return words;
>             }
>         };
>         
>         return f;
>     }
>     @Override
>     public DataType outputDataType() {
>         return DataTypes.createArrayType(DataTypes.StringType, true);
>     }
> }



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message