Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 007FD200C7D for ; Tue, 16 May 2017 14:49:13 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id F3699160BAC; Tue, 16 May 2017 12:49:12 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id BC391160B9D for ; Tue, 16 May 2017 14:49:11 +0200 (CEST) Received: (qmail 33896 invoked by uid 500); 16 May 2017 12:49:10 -0000 Mailing-List: contact issues-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@spark.apache.org Received: (qmail 33614 invoked by uid 99); 16 May 2017 12:49:10 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 16 May 2017 12:49:10 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 44DD71AFE92 for ; Tue, 16 May 2017 12:49:10 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -99.202 X-Spam-Level: X-Spam-Status: No, score=-99.202 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id yW_S3KyFdpP4 for ; Tue, 16 May 2017 12:49:06 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id 92EC75F666 for ; Tue, 16 May 2017 12:49:05 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id B736AE06C2 for ; Tue, 16 May 2017 12:49:04 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 717B82193E for ; Tue, 16 May 2017 12:49:04 +0000 (UTC) Date: Tue, 16 May 2017 12:49:04 +0000 (UTC) From: =?utf-8?Q?Maciej_Bry=C5=84ski_=28JIRA=29?= To: issues@spark.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (SPARK-20712) [SQL] Spark can't read Hive table when column type has length greater than 4000 bytes MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Tue, 16 May 2017 12:49:13 -0000 [ https://issues.apache.org/jira/browse/SPARK-20712?page=3Dcom.atlassia= n.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D160= 12277#comment-16012277 ]=20 Maciej Bry=C5=84ski commented on SPARK-20712: ---------------------------------------- CC: [~jiangxb], [~hvanhovell] > [SQL] Spark can't read Hive table when column type has length greater tha= n 4000 bytes > -------------------------------------------------------------------------= ------------ > > Key: SPARK-20712 > URL: https://issues.apache.org/jira/browse/SPARK-20712 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.1.1 > Reporter: Maciej Bry=C5=84ski > Priority: Critical > > Hi, > I have following issue. > I'm trying to read a table from hive when one of the column is nested so = it's schema has length longer than 4000 bytes. > Everything worked on Spark 2.0.2. On 2.1.1 I'm getting Exception: > {code} > >> spark.read.table("SOME_TABLE") > Traceback (most recent call last): > File "", line 1, in > File "/opt/spark-2.1.1/python/pyspark/sql/readwriter.py", line 259, in = table > return self._df(self._jreader.table(tableName)) > File "/opt/spark-2.1.1/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway= .py", line 1133, in __call__ > File "/opt/spark-2.1.1/python/pyspark/sql/utils.py", line 63, in deco > return f(*a, **kw) > File "/opt/spark-2.1.1/python/lib/py4j-0.10.4-src.zip/py4j/protocol.py"= , line 319, in get_return_value > py4j.protocol.Py4JJavaError: An error occurred while calling o71.table. > : org.apache.spark.SparkException: Cannot recognize hive type string: SOM= E_VERY_LONG_FIELD_TYPE > at org.apache.spark.sql.hive.client.HiveClientImpl.org$apache$spa= rk$sql$hive$client$HiveClientImpl$$fromHiveColumn(HiveClientImpl.scala:789) > at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getTa= bleOption$1$$anonfun$apply$11$$anonfun$7.apply(HiveClientImpl.scala:365) > at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getTa= bleOption$1$$anonfun$apply$11$$anonfun$7.apply(HiveClientImpl.scala:365) > at scala.collection.TraversableLike$$anonfun$map$1.apply(Traversa= bleLike.scala:234) > at scala.collection.TraversableLike$$anonfun$map$1.apply(Traversa= bleLike.scala:234) > at scala.collection.Iterator$class.foreach(Iterator.scala:893) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1336) > at scala.collection.IterableLike$class.foreach(IterableLike.scala= :72) > at scala.collection.AbstractIterable.foreach(Iterable.scala:54) > at scala.collection.TraversableLike$class.map(TraversableLike.sca= la:234) > at scala.collection.AbstractTraversable.map(Traversable.scala:104= ) > at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getTa= bleOption$1$$anonfun$apply$11.apply(HiveClientImpl.scala:365) > at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getTa= bleOption$1$$anonfun$apply$11.apply(HiveClientImpl.scala:361) > at scala.Option.map(Option.scala:146) > at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getTa= bleOption$1.apply(HiveClientImpl.scala:361) > at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getTa= bleOption$1.apply(HiveClientImpl.scala:359) > at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withH= iveState$1.apply(HiveClientImpl.scala:279) > at org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(= HiveClientImpl.scala:226) > at org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(Hi= veClientImpl.scala:225) > at org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(= HiveClientImpl.scala:268) > at org.apache.spark.sql.hive.client.HiveClientImpl.getTableOption= (HiveClientImpl.scala:359) > at org.apache.spark.sql.hive.client.HiveClient$class.getTable(Hiv= eClient.scala:74) > at org.apache.spark.sql.hive.client.HiveClientImpl.getTable(HiveC= lientImpl.scala:78) > at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$org$apa= che$spark$sql$hive$HiveExternalCatalog$$getRawTable$1.apply(HiveExternalCat= alog.scala:118) > at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$org$apa= che$spark$sql$hive$HiveExternalCatalog$$getRawTable$1.apply(HiveExternalCat= alog.scala:118) > at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveE= xternalCatalog.scala:97) > at org.apache.spark.sql.hive.HiveExternalCatalog.org$apache$spark= $sql$hive$HiveExternalCatalog$$getRawTable(HiveExternalCatalog.scala:117) > at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getTabl= e$1.apply(HiveExternalCatalog.scala:628) > at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getTabl= e$1.apply(HiveExternalCatalog.scala:628) > at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveE= xternalCatalog.scala:97) > at org.apache.spark.sql.hive.HiveExternalCatalog.getTable(HiveExt= ernalCatalog.scala:627) > at org.apache.spark.sql.hive.HiveMetastoreCatalog.lookupRelation(= HiveMetastoreCatalog.scala:124) > at org.apache.spark.sql.hive.HiveSessionCatalog.lookupRelation(Hi= veSessionCatalog.scala:70) > at org.apache.spark.sql.DataFrameReader.table(DataFrameReader.sca= la:473) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccess= orImpl.java:62) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMeth= odAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) > at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:= 357) > at py4j.Gateway.invoke(Gateway.java:280) > at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.jav= a:132) > at py4j.commands.CallCommand.execute(CallCommand.java:79) > at py4j.GatewayConnection.run(GatewayConnection.java:214) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.spark.sql.catalyst.parser.ParseException: > mismatched input '' expecting ':'(line 1, pos 4000) > {code} > EDIT:=20 > Way to reproduce this error (from pyspark) > {code} > >>> spark.range(10).selectExpr(*(map(lambda x: "id as very_long_column_n= ame_id" + str(x), range(200)))).selectExpr("struct(*) as nested").write.sav= eAsTable("test") > >>> spark.read.table("test") > Traceback (most recent call last): > File "", line 1, in > File "/opt/spark-2.1.1/python/pyspark/sql/readwriter.py", line 259, in = table > return self._df(self._jreader.table(tableName)) > File "/opt/spark-2.1.1/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway= .py", line 1133, in __call__ > File "/opt/spark-2.1.1/python/pyspark/sql/utils.py", line 63, in deco > return f(*a, **kw) > File "/opt/spark-2.1.1/python/lib/py4j-0.10.4-src.zip/py4j/protocol.py"= , line 319, in get_return_value > py4j.protocol.Py4JJavaError: An error occurred while calling o260.table. > : org.apache.spark.SparkException: Cannot recognize hive type string:stru= ct at org.apache.spark.sql.hive.client.HiveClientImpl.org$apache$spa= rk$sql$hive$client$HiveClientImpl$$fromHiveColumn(HiveClientImpl.scala:789) > at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getTa= bleOption$1$$anonfun$apply$11$$anonfun$7.apply(HiveClientImpl.scala:365) > at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getTa= bleOption$1$$anonfun$apply$11$$anonfun$7.apply(HiveClientImpl.scala:365) > at scala.collection.TraversableLike$$anonfun$map$1.apply(Traversa= bleLike.scala:234) > at scala.collection.TraversableLike$$anonfun$map$1.apply(Traversa= bleLike.scala:234) > at scala.collection.Iterator$class.foreach(Iterator.scala:893) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1336) > at scala.collection.IterableLike$class.foreach(IterableLike.scala= :72) > at scala.collection.AbstractIterable.foreach(Iterable.scala:54) > at scala.collection.TraversableLike$class.map(TraversableLike.sca= la:234) > at scala.collection.AbstractTraversable.map(Traversable.scala:104= ) > at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getTa= bleOption$1$$anonfun$apply$11.apply(HiveClientImpl.scala:365) > at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getTa= bleOption$1$$anonfun$apply$11.apply(HiveClientImpl.scala:361) > at scala.Option.map(Option.scala:146) > at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getTa= bleOption$1.apply(HiveClientImpl.scala:361) > at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getTa= bleOption$1.apply(HiveClientImpl.scala:359) > at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withH= iveState$1.apply(HiveClientImpl.scala:279) > at org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(= HiveClientImpl.scala:226) > at org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(Hi= veClientImpl.scala:225) > at org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(= HiveClientImpl.scala:268) > at org.apache.spark.sql.hive.client.HiveClientImpl.getTableOption= (HiveClientImpl.scala:359) > at org.apache.spark.sql.hive.client.HiveClient$class.getTable(Hiv= eClient.scala:74) > at org.apache.spark.sql.hive.client.HiveClientImpl.getTable(HiveC= lientImpl.scala:78) > at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$org$apa= che$spark$sql$hive$HiveExternalCatalog$$getRawTable$1.apply(HiveExternalCat= alog.scala:118) > at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$org$apa= che$spark$sql$hive$HiveExternalCatalog$$getRawTable$1.apply(HiveExternalCat= alog.scala:118) > at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveE= xternalCatalog.scala:97) > at org.apache.spark.sql.hive.HiveExternalCatalog.org$apache$spark= $sql$hive$HiveExternalCatalog$$getRawTable(HiveExternalCatalog.scala:117) > at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getTabl= e$1.apply(HiveExternalCatalog.scala:628) > at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getTabl= e$1.apply(HiveExternalCatalog.scala:628) > at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveE= xternalCatalog.scala:97) > at org.apache.spark.sql.hive.HiveExternalCatalog.getTable(HiveExt= ernalCatalog.scala:627) > at org.apache.spark.sql.hive.HiveMetastoreCatalog.lookupRelation(= HiveMetastoreCatalog.scala:124) > at org.apache.spark.sql.hive.HiveSessionCatalog.lookupRelation(Hi= veSessionCatalog.scala:70) > at org.apache.spark.sql.DataFrameReader.table(DataFrameReader.sca= la:473) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccess= orImpl.java:62) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMeth= odAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) > at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:= 357) > at py4j.Gateway.invoke(Gateway.java:280) > at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.jav= a:132) > at py4j.commands.CallCommand.execute(CallCommand.java:79) > at py4j.GatewayConnection.run(GatewayConnection.java:214) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.spark.sql.catalyst.parser.ParseException: > mismatched input '' expecting ':'(line 1, pos 4000) > {code} > From Spark 2.0.2: > {code} > >>> spark.read.table("test") > DataFrame[nested: struct] > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org For additional commands, e-mail: issues-help@spark.apache.org