spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Apache Spark (JIRA)" <>
Subject [jira] [Commented] (SPARK-25393) Parsing CSV strings in a column
Date Mon, 10 Sep 2018 09:17:00 GMT


Apache Spark commented on SPARK-25393:

User 'MaxGekk' has created a pull request for this issue:

> Parsing CSV strings in a column
> -------------------------------
>                 Key: SPARK-25393
>                 URL:
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 2.4.0
>            Reporter: Maxim Gekk
>            Priority: Minor
> There are use cases when content in CSV format is dumped into an external storage as
one of columns. For example, CSV records are stored together with other meta-info to Kafka.
Current Spark API doesn't allow to parse such columns directly. The existing method [csv()|]
requires a dataset with one string column. The API is inconvenient in parsing CSV column in
dataset with many columns. The ticket aims to add new function similar to [from_json()|]
with the following signatures in Scala:
> {code:scala}
> def from_csv(e: Column, schema: StructType, options: Map[String, String]): Column
> {code}
> and for using from Python, R and Java:
> {code:scala}
> def from_csv(e: Column, schema: String, options: java.util.Map[String, String]): Column
> {code} 

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message