spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From SNEHASISH DUTTA <info.snehas...@gmail.com>
Subject CSV reader 2.2.0 issue
Date Mon, 05 Mar 2018 14:07:23 GMT
 Hi,

I am using spark 2.2 csv reader

I have data in following format

123|123|"abc"||""|"xyz"

the requirement is || has to be treated as null
and "" has to be treated as blank character of length 0

I was using option sep as pipe
And option quote as ""
Parsed the data and using regex I was able to fulfill all the mentioned
conditions.
It started failing when I started column values like this "|" i.e.
separator itself has become a column value , spark csv reader started using
this value and made extra columns.

After this I used the escape option on "|", but results are similar.

I then tried dataset with split on "\\|" which had similar outcome

Is there any way to resolve this , with csv reader ?


Thanks and Regards,
Snehasish

Mime
View raw message