spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "eugen yushin (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SPARK-21442) Spark CSV writer trims trailing spaces
Date Mon, 17 Jul 2017 14:44:00 GMT

     [ https://issues.apache.org/jira/browse/SPARK-21442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

eugen yushin updated SPARK-21442:
---------------------------------
    Environment: 
version 2.1.0-mapr-1703
Using Scala version 2.11.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_131)

and 

version 2.1.1
Using Scala version 2.11.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_131)

  was:
Version 2.1.0-mapr-1703
Using Scala version 2.11.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_131)



> Spark CSV writer trims trailing spaces
> --------------------------------------
>
>                 Key: SPARK-21442
>                 URL: https://issues.apache.org/jira/browse/SPARK-21442
>             Project: Spark
>          Issue Type: Bug
>          Components: Input/Output
>    Affects Versions: 2.1.0, 2.1.1
>         Environment: version 2.1.0-mapr-1703
> Using Scala version 2.11.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_131)
> and 
> version 2.1.1
> Using Scala version 2.11.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_131)
>            Reporter: eugen yushin
>
> Looks like Spark truncates trailing spaces saving data with csv codec. Check the following
example for more details (note extra space at the end of "Johny " field):
> {code}
> scala> case class SampleRow(field1: String, field2: String)
> defined class SampleRow
> scala> val fooDS = Seq(SampleRow("Johny ", "Doe"), SampleRow("Ivan", "Susanin")).toDS()
> fooDS: org.apache.spark.sql.Dataset[SampleRow] = [field1: string, field2: string]
> scala> fooDS.collect.foreach(println)
> SampleRow(Johny ,Doe)
> SampleRow(Ivan,Susanin)
> scala> fooDS.show()
> +------+-------+
> |field1| field2|
> +------+-------+
> |Johny |    Doe|
> |  Ivan|Susanin|
> +------+-------+
> scala> import org.apache.spark.sql.SaveMode
> import org.apache.spark.sql.SaveMode
> scala> fooDS.write.option("delimiter", "|").mode(SaveMode.Overwrite).csv("file:///tmp/spaces.txt")
> cat /tmp/spaces.txt/*
> Johny|Doe
> Ivan|Susanin
> {code}
> I expect space before the pipe at the first line in output file.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message