spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ewan Leith <ewan.le...@realitymine.com>
Subject Re: Spark SQL Nested Array of JSON with empty field
Date Sun, 05 Jun 2016 06:04:01 GMT
The spark json read is unforgiving of things like missing elements from some json records,
or mixed types.

If you want to pass invalid json files through spark you're best doing an initial parse through
the Jackson APIs using a defined schema first, then you can set types like Option[String]
where a column is optional, then convert the validated back into a new string variable, then
read the string as a dataframe.

Thanks,
Ewan

On 3 Jun 2016 22:03, Jerry Wong <jerry.king2.wong@gmail.com> wrote:
Hi,

I met a problem of empty field in the nested JSON file with Spark SQL. For instance,
There are two lines of JSON file as follows,

{
"firstname": "Jack",
"lastname": "Nelson",
"address": {
"state": "New York",
"city": "New York"
}
}{
"firstname": "Landy",
"middlename": "Ken",
"lastname": "Yong",
"address": {
"state": "California",
"city": "Los Angles"
}
}

I use Spark SQL to get the files like,
val row = sqlContext.sql("SELECT firstname, middlename, lastname, address.state, address.city
FROM jsontable")
The compile will tell me the error of line1: no "middlename".
How do I handle this case in the SQL sql?

Many thanks in advance!
Jerry


Mime
View raw message