spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hyukjin Kwon (JIRA)" <>
Subject [jira] [Commented] (SPARK-27609) from_json expects values of options dictionary to be
Date Fri, 03 May 2019 08:38:00 GMT


Hyukjin Kwon commented on SPARK-27609:

Okay, we can match to \{read.option(...)}.

> from_json expects values of options dictionary to be 
> -----------------------------------------------------
>                 Key: SPARK-27609
>                 URL:
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 2.2.1
>         Environment: I've found this issue on an AWS Glue development endpoint which
is running Spark 2.2.1 and being given jobs through a SparkMagic Python 2 kernel, running
through livy and all that. I don't know how much of that is important for reproduction, and
can get more details if needed. 
>            Reporter: Zachary Jablons
>            Priority: Minor
> When reading a column of a DataFrame that consists of serialized JSON, one of the options
for inferring the schema and then parsing the JSON is to do a two step process consisting
> {code}
> # this results in a new dataframe where the top-level keys of the JSON # are columns
> df_parsed_direct = row: row.json_col))
> # this does that while preserving the rest of df
> schema = df_parsed_direct.schema
> df_parsed = df.withColumn('parsed', from_json(df.json_col, schema)
> {code}
> When I do this, I sometimes find myself passing in options. My understanding is, from
the documentation [here|], that
the nature of these options passed should be the same whether I do
> {code}
> {code}
> or
> {code}
> from_json(df.json_col, schema, options={'option':value})
> {code}
> However, I've found that the latter expects value to be a string representation of the
value that can be decoded by JSON. So, for example options=\{'multiLine':True} fails with 
> {code}
> java.lang.ClassCastException: java.lang.Boolean cannot be cast to java.lang.String
> {code}
> whereas {{options=\{'multiLine':'true'}}} works just fine. 
> Notably, providing {{'multiLine',True)}} works fine!
> The code for reproducing this issue as well as the stacktrace from hitting it are provided
in [this gist|]. 
> I also noticed that from_json doesn't complain if you give it a garbage option key –
but that seems separate.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message