spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wayne Zhang (JIRA)" <j...@apache.org>
Subject [jira] [Created] (SPARK-20574) Allow Bucketizer to handle non-Double column
Date Wed, 03 May 2017 06:00:12 GMT
Wayne Zhang created SPARK-20574:
-----------------------------------

             Summary: Allow Bucketizer to handle non-Double column
                 Key: SPARK-20574
                 URL: https://issues.apache.org/jira/browse/SPARK-20574
             Project: Spark
          Issue Type: Bug
          Components: ML
    Affects Versions: 2.1.0
            Reporter: Wayne Zhang


Bucketizer currently requires input column to be Double, but the logic should work on any
numeric data types. Many practical problems have integer/float data types, and it could get
very tedious to manually cast them into Double before calling bucketizer. This transformer
could be extended to handle all numeric types.  

The example below shows failure of Bucketizer on integer data. 
{code}
val splits = Array(-3.0, 0.0, 3.0)
val data: Array[Int] = Array(-2, -1, 0, 1, 2)
val expectedBuckets = Array(0.0, 0.0, 1.0, 1.0, 1.0)
val dataFrame = data.zip(expectedBuckets).toSeq.toDF("feature", "expected")
val bucketizer = new Bucketizer()
  .setInputCol("feature")
  .setOutputCol("result")
  .setSplits(splits)
bucketizer.transform(dataFrame)  

java.lang.IllegalArgumentException: requirement failed: Column feature must be of type DoubleType
but was actually IntegerType.
{code}





--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message