hivemall-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Takeshi Yamamuro (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HIVEMALL-61) Support a function to convert a comma-separated string into typed data and vice versa
Date Thu, 09 Feb 2017 17:37:41 GMT
Takeshi Yamamuro created HIVEMALL-61:
----------------------------------------

             Summary: Support a function to convert a comma-separated string into typed data
and vice versa
                 Key: HIVEMALL-61
                 URL: https://issues.apache.org/jira/browse/HIVEMALL-61
             Project: Hivemall
          Issue Type: New Feature
            Reporter: Takeshi Yamamuro
            Priority: Minor


Currently, spark does not have this features (IMO this feature will not appear as first-class
ones in Spark) it is useful for ETL before ML processing.
e.x.)
{code}
scala> val ds1 = Seq("""1,abc""").toDS()
ds1: org.apache.spark.sql.Dataset[String] = [value: string]

scala> val schema = new StructType().add("a", IntegerType).add("b", StringType)
schema: org.apache.spark.sql.types.StructType = StructType(StructField(a,IntegerType,true),
StructField(b,StringType,true))

scala> val ds2 = ds1.select(from_csv($"value", schema))
ds2: org.apache.spark.sql.DataFrame = [csvtostruct(value): struct<a: int, b: string>]

scala> ds2.printSchema
root
 |-- csvtostruct(value): struct (nullable = true)
 |    |-- a: integer (nullable = true)
 |    |-- b: string (nullable = true)


scala> ds2.show
+------------------+
|csvtostruct(value)|
+------------------+
|           [1,abc]|
+------------------+
{code}
A related discussion is here: https://github.com/apache/spark/pull/13300#issuecomment-261962773



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message