spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Apache Spark (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-15212) CSV file reader when read file with first line schema do not filter blank in schema column name
Date Sun, 08 May 2016 11:31:12 GMT

    [ https://issues.apache.org/jira/browse/SPARK-15212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15275563#comment-15275563
] 

Apache Spark commented on SPARK-15212:
--------------------------------------

User 'WeichenXu123' has created a pull request for this issue:
https://github.com/apache/spark/pull/12987

> CSV file reader when read file with first line schema do not filter blank in schema column
name
> -----------------------------------------------------------------------------------------------
>
>                 Key: SPARK-15212
>                 URL: https://issues.apache.org/jira/browse/SPARK-15212
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.6.1, 1.6.2, 2.0.0, 2.1.0
>            Reporter: Weichen Xu
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> for example, run the following code in spark-shell,
> val sqlContext = new org.apache.spark.sql.SQLContext(sc);
> var reader = sqlContext.read
> reader.option("header", true)
> var df = reader.csv("file:///diskext/tdata/spark/d1.csv")
> when the csv data file contains´╝Ü
> ----------------------------------------------------------
> col1, col2,col3,col4,col5
> 1997,Ford,E350,"ac, abs, moon",3000.00
> ....
> ------------------------------------------------------------
> the first line contains schema, the col2 has a blank before it,
> then the generated DataFrame's schema column name contains the blank.
> This may cause potential problem for example
> df.select("col2") 
> can't find the column, must use 
> df.select(" col2") 
> and if register the dataframe as a table, then do query, can't select col2.
> df.registerTempTable("tab1");
> sqlContext.sql("select col2 from tab1"); //will fail
> must add a column name validate when load csv file with schema.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message