From issues-return-212022-archive-asf-public=cust-asf.ponee.io@spark.apache.org Tue Dec 18 14:07:13 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id B51FE180669 for ; Tue, 18 Dec 2018 14:07:12 +0100 (CET) Received: (qmail 24366 invoked by uid 500); 18 Dec 2018 13:07:11 -0000 Mailing-List: contact issues-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@spark.apache.org Received: (qmail 24357 invoked by uid 99); 18 Dec 2018 13:07:11 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 18 Dec 2018 13:07:11 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 5863ECECDC for ; Tue, 18 Dec 2018 13:07:11 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -109.501 X-Spam-Level: X-Spam-Status: No, score=-109.501 tagged_above=-999 required=6.31 tests=[ENV_AND_HDR_SPF_MATCH=-0.5, KAM_ASCII_DIVIDERS=0.8, RCVD_IN_DNSWL_MED=-2.3, SPF_PASS=-0.001, USER_IN_DEF_SPF_WL=-7.5, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id ckTb5rFNMQOz for ; Tue, 18 Dec 2018 13:07:08 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id 978136110E for ; Tue, 18 Dec 2018 13:07:06 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 27808E00EA for ; Tue, 18 Dec 2018 13:07:01 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id CA41B23FCF for ; Tue, 18 Dec 2018 13:07:00 +0000 (UTC) Date: Tue, 18 Dec 2018 13:07:00 +0000 (UTC) From: "Sean Owen (JIRA)" To: issues@spark.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Resolved] (SPARK-20712) [SPARK 2.1 REGRESSION][SQL] Spark can't read Hive table when column type has length greater than 4000 bytes MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/SPARK-20712?page=3Dcom.atlassi= an.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-20712. ------------------------------- Resolution: Cannot Reproduce > [SPARK 2.1 REGRESSION][SQL] Spark can't read Hive table when column type = has length greater than 4000 bytes > -------------------------------------------------------------------------= ---------------------------------- > > Key: SPARK-20712 > URL: https://issues.apache.org/jira/browse/SPARK-20712 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.1.1, 2.1.2, 2.2.0, 2.3.0 > Reporter: Maciej Bry=C5=84ski > Priority: Critical > > Hi, > I have following issue. > I'm trying to read a table from hive when one of the column is nested so = it's schema has length longer than 4000 bytes. > Everything worked on Spark 2.0.2. On 2.1.1 I'm getting Exception: > {code} > >> spark.read.table("SOME_TABLE") > Traceback (most recent call last): > File "", line 1, in > File "/opt/spark-2.1.1/python/pyspark/sql/readwriter.py", line 259, in = table > return self._df(self._jreader.table(tableName)) > File "/opt/spark-2.1.1/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway= .py", line 1133, in __call__ > File "/opt/spark-2.1.1/python/pyspark/sql/utils.py", line 63, in deco > return f(*a, **kw) > File "/opt/spark-2.1.1/python/lib/py4j-0.10.4-src.zip/py4j/protocol.py"= , line 319, in get_return_value > py4j.protocol.Py4JJavaError: An error occurred while calling o71.table. > : org.apache.spark.SparkException: Cannot recognize hive type string: SOM= E_VERY_LONG_FIELD_TYPE > at org.apache.spark.sql.hive.client.HiveClientImpl.org$apache$spa= rk$sql$hive$client$HiveClientImpl$$fromHiveColumn(HiveClientImpl.scala:789) > at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getTa= bleOption$1$$anonfun$apply$11$$anonfun$7.apply(HiveClientImpl.scala:365) > at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getTa= bleOption$1$$anonfun$apply$11$$anonfun$7.apply(HiveClientImpl.scala:365) > at scala.collection.TraversableLike$$anonfun$map$1.apply(Traversa= bleLike.scala:234) > at scala.collection.TraversableLike$$anonfun$map$1.apply(Traversa= bleLike.scala:234) > at scala.collection.Iterator$class.foreach(Iterator.scala:893) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1336) > at scala.collection.IterableLike$class.foreach(IterableLike.scala= :72) > at scala.collection.AbstractIterable.foreach(Iterable.scala:54) > at scala.collection.TraversableLike$class.map(TraversableLike.sca= la:234) > at scala.collection.AbstractTraversable.map(Traversable.scala:104= ) > at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getTa= bleOption$1$$anonfun$apply$11.apply(HiveClientImpl.scala:365) > at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getTa= bleOption$1$$anonfun$apply$11.apply(HiveClientImpl.scala:361) > at scala.Option.map(Option.scala:146) > at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getTa= bleOption$1.apply(HiveClientImpl.scala:361) > at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getTa= bleOption$1.apply(HiveClientImpl.scala:359) > at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withH= iveState$1.apply(HiveClientImpl.scala:279) > at org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(= HiveClientImpl.scala:226) > at org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(Hi= veClientImpl.scala:225) > at org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(= HiveClientImpl.scala:268) > at org.apache.spark.sql.hive.client.HiveClientImpl.getTableOption= (HiveClientImpl.scala:359) > at org.apache.spark.sql.hive.client.HiveClient$class.getTable(Hiv= eClient.scala:74) > at org.apache.spark.sql.hive.client.HiveClientImpl.getTable(HiveC= lientImpl.scala:78) > at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$org$apa= che$spark$sql$hive$HiveExternalCatalog$$getRawTable$1.apply(HiveExternalCat= alog.scala:118) > at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$org$apa= che$spark$sql$hive$HiveExternalCatalog$$getRawTable$1.apply(HiveExternalCat= alog.scala:118) > at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveE= xternalCatalog.scala:97) > at org.apache.spark.sql.hive.HiveExternalCatalog.org$apache$spark= $sql$hive$HiveExternalCatalog$$getRawTable(HiveExternalCatalog.scala:117) > at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getTabl= e$1.apply(HiveExternalCatalog.scala:628) > at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getTabl= e$1.apply(HiveExternalCatalog.scala:628) > at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveE= xternalCatalog.scala:97) > at org.apache.spark.sql.hive.HiveExternalCatalog.getTable(HiveExt= ernalCatalog.scala:627) > at org.apache.spark.sql.hive.HiveMetastoreCatalog.lookupRelation(= HiveMetastoreCatalog.scala:124) > at org.apache.spark.sql.hive.HiveSessionCatalog.lookupRelation(Hi= veSessionCatalog.scala:70) > at org.apache.spark.sql.DataFrameReader.table(DataFrameReader.sca= la:473) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccess= orImpl.java:62) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMeth= odAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) > at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:= 357) > at py4j.Gateway.invoke(Gateway.java:280) > at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.jav= a:132) > at py4j.commands.CallCommand.execute(CallCommand.java:79) > at py4j.GatewayConnection.run(GatewayConnection.java:214) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.spark.sql.catalyst.parser.ParseException: > mismatched input '' expecting ':'(line 1, pos 4000) > {code} > EDIT:=20 > Way to reproduce this error (from pyspark) > {code} > >>> spark.range(10).selectExpr(*(map(lambda x: "id as very_long_column_n= ame_id" + str(x), range(200)))).selectExpr("struct(*) as nested").write.sav= eAsTable("test") > >>> spark.read.table("test") > Traceback (most recent call last): > File "", line 1, in > File "/opt/spark-2.1.1/python/pyspark/sql/readwriter.py", line 259, in = table > return self._df(self._jreader.table(tableName)) > File "/opt/spark-2.1.1/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway= .py", line 1133, in __call__ > File "/opt/spark-2.1.1/python/pyspark/sql/utils.py", line 63, in deco > return f(*a, **kw) > File "/opt/spark-2.1.1/python/lib/py4j-0.10.4-src.zip/py4j/protocol.py"= , line 319, in get_return_value > py4j.protocol.Py4JJavaError: An error occurred while calling o260.table. > : org.apache.spark.SparkException: Cannot recognize hive type string:stru= ct at org.apache.spark.sql.hive.client.HiveClientImpl.org$apache$spa= rk$sql$hive$client$HiveClientImpl$$fromHiveColumn(HiveClientImpl.scala:789) > at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getTa= bleOption$1$$anonfun$apply$11$$anonfun$7.apply(HiveClientImpl.scala:365) > at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getTa= bleOption$1$$anonfun$apply$11$$anonfun$7.apply(HiveClientImpl.scala:365) > at scala.collection.TraversableLike$$anonfun$map$1.apply(Traversa= bleLike.scala:234) > at scala.collection.TraversableLike$$anonfun$map$1.apply(Traversa= bleLike.scala:234) > at scala.collection.Iterator$class.foreach(Iterator.scala:893) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1336) > at scala.collection.IterableLike$class.foreach(IterableLike.scala= :72) > at scala.collection.AbstractIterable.foreach(Iterable.scala:54) > at scala.collection.TraversableLike$class.map(TraversableLike.sca= la:234) > at scala.collection.AbstractTraversable.map(Traversable.scala:104= ) > at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getTa= bleOption$1$$anonfun$apply$11.apply(HiveClientImpl.scala:365) > at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getTa= bleOption$1$$anonfun$apply$11.apply(HiveClientImpl.scala:361) > at scala.Option.map(Option.scala:146) > at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getTa= bleOption$1.apply(HiveClientImpl.scala:361) > at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getTa= bleOption$1.apply(HiveClientImpl.scala:359) > at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withH= iveState$1.apply(HiveClientImpl.scala:279) > at org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(= HiveClientImpl.scala:226) > at org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(Hi= veClientImpl.scala:225) > at org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(= HiveClientImpl.scala:268) > at org.apache.spark.sql.hive.client.HiveClientImpl.getTableOption= (HiveClientImpl.scala:359) > at org.apache.spark.sql.hive.client.HiveClient$class.getTable(Hiv= eClient.scala:74) > at org.apache.spark.sql.hive.client.HiveClientImpl.getTable(HiveC= lientImpl.scala:78) > at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$org$apa= che$spark$sql$hive$HiveExternalCatalog$$getRawTable$1.apply(HiveExternalCat= alog.scala:118) > at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$org$apa= che$spark$sql$hive$HiveExternalCatalog$$getRawTable$1.apply(HiveExternalCat= alog.scala:118) > at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveE= xternalCatalog.scala:97) > at org.apache.spark.sql.hive.HiveExternalCatalog.org$apache$spark= $sql$hive$HiveExternalCatalog$$getRawTable(HiveExternalCatalog.scala:117) > at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getTabl= e$1.apply(HiveExternalCatalog.scala:628) > at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getTabl= e$1.apply(HiveExternalCatalog.scala:628) > at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveE= xternalCatalog.scala:97) > at org.apache.spark.sql.hive.HiveExternalCatalog.getTable(HiveExt= ernalCatalog.scala:627) > at org.apache.spark.sql.hive.HiveMetastoreCatalog.lookupRelation(= HiveMetastoreCatalog.scala:124) > at org.apache.spark.sql.hive.HiveSessionCatalog.lookupRelation(Hi= veSessionCatalog.scala:70) > at org.apache.spark.sql.DataFrameReader.table(DataFrameReader.sca= la:473) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccess= orImpl.java:62) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMeth= odAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) > at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:= 357) > at py4j.Gateway.invoke(Gateway.java:280) > at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.jav= a:132) > at py4j.commands.CallCommand.execute(CallCommand.java:79) > at py4j.GatewayConnection.run(GatewayConnection.java:214) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.spark.sql.catalyst.parser.ParseException: > mismatched input '' expecting ':'(line 1, pos 4000) > {code} > From Spark 2.0.2: > {code} > >>> spark.read.table("test") > DataFrame[nested: struct] > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org For additional commands, e-mail: issues-help@spark.apache.org