spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hyukjin Kwon (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-23194) from_json in FAILFAST mode doesn't fail fast, instead it just returns nulls
Date Wed, 24 Jan 2018 01:27:00 GMT

    [ https://issues.apache.org/jira/browse/SPARK-23194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16336709#comment-16336709
] 

Hyukjin Kwon commented on SPARK-23194:
--------------------------------------

Yup, I think we don't support the parse modes in json expressions so far but have _resembled_
PERMISSIVE. I think it makes sense for FAILFAST and PERMISSIVE in general but thing is about
DROPMALFORMED because it basically means dropping records. I think we can support FAILFAST
and PERMISSIVE mode alone for now if it's well documented.

Another thing is, current behaviour in the json expression doesn't completely follow PERMISSIVE
mode IMHO. We should consider that PERMISSIVE mode expects to add malformed jsons to a column
specified in {{columnNameOfCorruptRecord}} or {{spark.sql.columnNameOfCorruptRecord}}.

Also, I think we should make sure about the behaviour change, for example, the case like,
if we parse with {{ArrayType}} with PERMISSIVE mode, it could end up with like \[null, null\]
which was simply {{null}} in our current code base. I am less sure about how we deal with
the malformed records in this case too.

> from_json in FAILFAST mode doesn't fail fast, instead it just returns nulls
> ---------------------------------------------------------------------------
>
>                 Key: SPARK-23194
>                 URL: https://issues.apache.org/jira/browse/SPARK-23194
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.3.0
>            Reporter: Burak Yavuz
>            Priority: Major
>
> from_json accepts Json parsing options such as being PERMISSIVE to parsing errors or
failing fast. It seems from the code that even though the default option is to fail fast,
we catch that exception and return nulls.
>  
> In order to not change behavior, we should remove that try-catch block and change the
default to permissive, but allow failfast mode to indeed fail.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message