spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kuba Tyszko (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SPARK-18906) CSV parser should return null for empty (or with "") numeric columns.
Date Fri, 16 Dec 2016 21:43:58 GMT

     [ https://issues.apache.org/jira/browse/SPARK-18906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Kuba Tyszko updated SPARK-18906:
--------------------------------
    Description: 
Spark allows user to set a nullValue that will indicate certain value's translation to a null
type , for example string "NA" could be the one.
Data sources that use such nullValue but also have other columns that may contain empty values
may not be parsed correctly.
The change resolves that by assuming that:
when column is infered as numeric
its field will be set to null when parsing fails, for example upon seeing empty value or an
empty string.

Example:

---------------
|char|int1|int2|
---------------
|a|1|2|
---------------
|a||0|
---------------
|NA|""|""|
----------------

This example illustrates that column "char" may contain an empty value indicated as "NA",
column int1 has a "true null" value but then both int1 and int2 columns have an empty string
set as their values.
In such situation parsing will fail.




  was:
Spark allows user to set a nullValue that will indicate certain value's translation to a null
type , for example string "NA" could be the one.
Data sources that use such nullValue but also have other columns that may contain empty values
may not be parsed correctly.
The change resolves that by assuming that:
when column is infered as numeric
its field will be set to null when parsing fails, for example upon seeing empty value or an
empty string.

Example:

---------------
|char|int1|int2
---------------
|a|1|2|
---------------
|a||0
---------------
|NA|""|""
----------------

This example illustrates that column "char" may contain an empty value indicated as "NA",
column int1 has a "true null" value but then both int1 and int2 columns have an empty string
set as their values.
In such situation parsing will fail.





> CSV parser should return null for empty (or with "") numeric columns.
> ---------------------------------------------------------------------
>
>                 Key: SPARK-18906
>                 URL: https://issues.apache.org/jira/browse/SPARK-18906
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.0.1
>            Reporter: Kuba Tyszko
>            Priority: Minor
>
> Spark allows user to set a nullValue that will indicate certain value's translation to
a null type , for example string "NA" could be the one.
> Data sources that use such nullValue but also have other columns that may contain empty
values may not be parsed correctly.
> The change resolves that by assuming that:
> when column is infered as numeric
> its field will be set to null when parsing fails, for example upon seeing empty value
or an empty string.
> Example:
> ---------------
> |char|int1|int2|
> ---------------
> |a|1|2|
> ---------------
> |a||0|
> ---------------
> |NA|""|""|
> ----------------
> This example illustrates that column "char" may contain an empty value indicated as "NA",
column int1 has a "true null" value but then both int1 and int2 columns have an empty string
set as their values.
> In such situation parsing will fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message