spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Herman van Hovell (JIRA)" <>
Subject [jira] [Issue Comment Deleted] (SPARK-17335) Creating Hive table from Spark data
Date Thu, 01 Sep 2016 14:55:20 GMT


Herman van Hovell updated SPARK-17335:
    Comment: was deleted

(was: Turns out this is not correct, StructType handles this correctly.

[~jupblb] Could you check the log for the following message:
16/09/01 16:41:45 WARN Utils: Truncated the string representation of a plan since it was too
large. This behavior can be adjusted by setting 'spark.debug.maxToStringFields' in SparkEnv.conf.

> Creating Hive table from Spark data
> -----------------------------------
>                 Key: SPARK-17335
>                 URL:
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.0.0
>            Reporter: Michal Kielbowicz
> Recently my team started using Spark for analysis of huge JSON objects. Spark itself
handles it well. The problem starts when we try to create a Hive table from it using steps
from this part of doc:
> After running command `spark.sql("CREATE TABLE x AS (SELECT * FROM y)") we get following
exception (sorry for obfuscating, confidential data):
> {code}
> Exception in thread "main" org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException:
java.lang.IllegalArgumentException: Error: : expected at the position 993 of 'string:struct<a:boolean,b:array<string>,c:boolean,d:struct<e:boolean,f:boolean,[...(few
others)],z:boolean,... 4 more fields>,[...(rest of valid struct string)]>' but ' ' is
> {code}
> It turned out that the exception was raised because of `... 4 more fields` part as it
is not a valid representation of data structure.
> An easy workaround is to set `spark.debug.maxToStringFields` to some large value. Nevertheless
it shouldn't be required and the stringifying process should use methods targeted at giving
valid data structure for Hive.
> In my opinion the root problem is here:
when calling `simpleString` method instead of `catalogString`. Nevertheless this class is
used at many places and I don't feel that experienced with Spark to automatically submit PR.
> We believe this issue is indirectly caused by this PR:
> There has been almost the same issue in the past. You can find it here:

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message