hivemall-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Takeshi Yamamuro (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVEMALL-61) Support a function to convert a comma-separated string into typed data and vice versa
Date Mon, 03 Jul 2017 05:50:00 GMT

    [ https://issues.apache.org/jira/browse/HIVEMALL-61?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16071948#comment-16071948
] 

Takeshi Yamamuro commented on HIVEMALL-61:
------------------------------------------

This issue has already fixed in https://github.com/apache/incubator-hivemall/pull/62/files,
so I'll close this as fixed. Thanks, [~Downchuck], the suggestion. Yea, we intend to support
any functionality (that vanilla spark does not support) for easy-to-use. Can you file a new
jira for that? Thanks again!

> Support a function to convert a comma-separated string into typed data and vice versa
> -------------------------------------------------------------------------------------
>
>                 Key: HIVEMALL-61
>                 URL: https://issues.apache.org/jira/browse/HIVEMALL-61
>             Project: Hivemall
>          Issue Type: New Feature
>            Reporter: Takeshi Yamamuro
>            Priority: Minor
>
> Currently, spark does not have this features (IMO this feature will not appear as first-class
ones in Spark) it is useful for ETL before ML processing.
> e.x.)
> {code}
> scala> val ds1 = Seq("""1,abc""").toDS()
> ds1: org.apache.spark.sql.Dataset[String] = [value: string]
> scala> val schema = new StructType().add("a", IntegerType).add("b", StringType)
> schema: org.apache.spark.sql.types.StructType = StructType(StructField(a,IntegerType,true),
StructField(b,StringType,true))
> scala> val ds2 = ds1.select(from_csv($"value", schema))
> ds2: org.apache.spark.sql.DataFrame = [csvtostruct(value): struct<a: int, b: string>]
> scala> ds2.printSchema
> root
>  |-- csvtostruct(value): struct (nullable = true)
>  |    |-- a: integer (nullable = true)
>  |    |-- b: string (nullable = true)
> scala> ds2.show
> +------------------+
> |csvtostruct(value)|
> +------------------+
> |           [1,abc]|
> +------------------+
> {code}
> A related discussion is here: https://github.com/apache/spark/pull/13300#issuecomment-261962773



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message