Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 9B0CB200B66 for ; Thu, 18 Aug 2016 23:56:38 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 99845160AAE; Thu, 18 Aug 2016 21:56:38 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 70B7E160A86 for ; Thu, 18 Aug 2016 23:56:36 +0200 (CEST) Received: (qmail 71389 invoked by uid 500); 18 Aug 2016 21:56:34 -0000 Mailing-List: contact user-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list user@spark.apache.org Received: (qmail 71379 invoked by uid 99); 18 Aug 2016 21:56:34 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 18 Aug 2016 21:56:34 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 7C7D31A0C9B for ; Thu, 18 Aug 2016 21:56:33 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 6 X-Spam-Level: ****** X-Spam-Status: No, score=6 tagged_above=-999 required=6.31 tests=[HTML_MESSAGE=2, KAM_LAZY_DOMAIN_SECURITY=1, MANY_SPAN_IN_TEXT=2.999, MIME_QP_LONG_LINE=0.001, RCVD_IN_DNSWL_NONE=-0.0001] autolearn=disabled Received: from mx2-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id ThFexba7WU6v for ; Thu, 18 Aug 2016 21:56:28 +0000 (UTC) Received: from smtprelay.hostedemail.com (smtprelay0001.hostedemail.com [216.40.44.1]) by mx2-lw-us.apache.org (ASF Mail Server at mx2-lw-us.apache.org) with ESMTPS id 9B3355FBBC for ; Thu, 18 Aug 2016 21:56:28 +0000 (UTC) Received: from filter.hostedemail.com (unknown [216.40.38.60]) by smtprelay04.hostedemail.com (Postfix) with ESMTP id 48A6535201D for ; Thu, 18 Aug 2016 21:56:21 +0000 (UTC) X-Session-Marker: 416E64794053616E74614372757A496E746567726174696F6E2E636F6D X-Spam-Summary: 10,1,0,,d41d8cd98f00b204,andy@santacruzintegration.com,:,RULES_HIT:152:355:379:871:960:962:973:978:988:989:1000:1189:1260:1261:1313:1314:1345:1381:1394:1437:1516:1517:1518:1575:1588:1589:1594:1605:1616:1730:1764:1776:1792:1801:2197:2198:2199:2200:2527:2559:2562:2892:2894:2908:2912:3138:3139:3140:3141:3142:3650:3651:3865:3867:3868:3870:3871:3873:4250:4321:4605:4837:5007:6120:6300:6360:6506:6650:6678:6747:6748:7281:7652:7903:7904:7974:8603:8660:8957:9121:9915:10004:10085:10394:10848:11026:11604:11638:11639:11914:11984:12043:12114:12296:12438:12517:12519:12555:12663:12679:12732:12737:12740:13017:13018:13019:13132:13139:13148:13160:13161:13229:13230:13231:13255:13870:13972:14096:14659:21080:21095:21221:21324:21325:21349:21433:21451:30006:30012:30026:30029:30051:30054:30064:30069:30075:30090:30091,0,RBL:none,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fn,MSBL:0,DNSBL:none,Custom_rules:0:0:0,LFtime:3,LUA_SUMMARY:none X-HE-Tag: waves98_372e4c98c2b30 X-Filterd-Recvd-Size: 90052 Received: from [192.168.1.126] (c-69-181-234-49.hsd1.ca.comcast.net [69.181.234.49]) (Authenticated sender: Andy@SantaCruzIntegration.com) by omf03.hostedemail.com (Postfix) with ESMTPA for ; Thu, 18 Aug 2016 21:56:18 +0000 (UTC) User-Agent: Microsoft-MacOutlook/14.6.5.160527 Date: Thu, 18 Aug 2016 14:56:16 -0700 Subject: pyspark unable to create UDF: java.lang.RuntimeException: org.apache.hadoop.fs.FileAlreadyExistsException: Parent path is not a directory: /tmp tmp From: Andy Davidson To: "user @spark" Message-ID: Thread-Topic: pyspark unable to create UDF: java.lang.RuntimeException: org.apache.hadoop.fs.FileAlreadyExistsException: Parent path is not a directory: /tmp tmp Mime-version: 1.0 Content-type: multipart/mixed; boundary="B_3554376979_1239654" archived-at: Thu, 18 Aug 2016 21:56:38 -0000 --B_3554376979_1239654 Content-type: multipart/alternative; boundary="B_3554376979_1273284" --B_3554376979_1273284 Content-type: text/plain; charset="UTF-8" Content-transfer-encoding: quoted-printable For unknown reason I can not create UDF when I run the attached notebook on my cluster. I get the following error Py4JJavaError: An error occurred while calling None.org.apache.spark.sql.hive.HiveContext. : java.lang.RuntimeException: org.apache.hadoop.fs.FileAlreadyExistsException: Parent path is not a directory: /tmp tmp The notebook runs fine on my Mac In general I am able to run non UDF spark code with out any trouble I start the notebook server as the user =E2=80=9Cec2-user" and uses master URL spark://ec2-51-215-120-63.us-west-1.compute.amazonaws.com:6066 I found the following message in the notebook server log file. I have log level set to warn 16/08/18 21:38:45 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0 16/08/18 21:38:45 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException The cluster was originally created using spark-1.6.1-bin-hadoop2.6/ec2/spark-ec2 #from pyspark.sql import SQLContext, HiveContext #sqlContext =3D SQLContext(sc) =E2=80=8B #from pyspark.sql import DataFrame #from pyspark.sql import functions =E2=80=8B from pyspark.sql.types import StringType from pyspark.sql.functions import udf =E2=80=8B print("spark version: {}".format(sc.version)) =E2=80=8B import sys print("python version: {}".format(sys.version)) spark version: 1.6.1 python version: 3.4.3 (default, Apr 1 2015, 18:10:40) [GCC 4.8.2 20140120 (Red Hat 4.8.2-16)] # functions.lower() raises # py4j.Py4JException: Method lower([class java.lang.String]) does not exist # work around define a UDF toLowerUDFRetType =3D StringType() #toLowerUDF =3D udf(lambda s : s.lower(), toLowerUDFRetType) toLowerUDF =3D udf(lambda s : s.lower(), StringType()) You must build Spark with Hive. Export 'SPARK_HIVE=3Dtrue' and run build/sbt assembly Py4JJavaErrorTraceback (most recent call last) in () 4 toLowerUDFRetType =3D StringType() 5 #toLowerUDF =3D udf(lambda s : s.lower(), toLowerUDFRetType) ----> 6 toLowerUDF =3D udf(lambda s : s.lower(), StringType()) /root/spark/python/pyspark/sql/functions.py in udf(f, returnType) 1595 [Row(slen=3D5), Row(slen=3D3)] 1596 """ -> 1597 return UserDefinedFunction(f, returnType) 1598=20 1599 blacklist =3D ['map', 'since', 'ignore_unicode_prefix'] /root/spark/python/pyspark/sql/functions.py in __init__(self, func, returnType, name) 1556 self.returnType =3D returnType 1557 self._broadcast =3D None -> 1558 self._judf =3D self._create_judf(name) 1559=20 1560 def _create_judf(self, name): /root/spark/python/pyspark/sql/functions.py in _create_judf(self, name) 1567 pickled_command, broadcast_vars, env, includes =3D _prepare_for_python_RDD(sc, command, self) 1568 ctx =3D SQLContext.getOrCreate(sc) -> 1569 jdt =3D ctx._ssql_ctx.parseDataType(self.returnType.json()) 1570 if name is None: 1571 name =3D f.__name__ if hasattr(f, '__name__') else f.__class__.__name__ /root/spark/python/pyspark/sql/context.py in _ssql_ctx(self) 681 try: 682 if not hasattr(self, '_scala_HiveContext'): --> 683 self._scala_HiveContext =3D self._get_hive_ctx() 684 return self._scala_HiveContext 685 except Py4JError as e: /root/spark/python/pyspark/sql/context.py in _get_hive_ctx(self) 690=20 691 def _get_hive_ctx(self): --> 692 return self._jvm.HiveContext(self._jsc.sc()) 693=20 694 def refreshTable(self, tableName): /root/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py in __call__(self, *args) 1062 answer =3D self._gateway_client.send_command(command) 1063 return_value =3D get_return_value( -> 1064 answer, self._gateway_client, None, self._fqn) 1065=20 1066 for temp_arg in temp_args: /root/spark/python/pyspark/sql/utils.py in deco(*a, **kw) 43 def deco(*a, **kw): 44 try: ---> 45 return f(*a, **kw) 46 except py4j.protocol.Py4JJavaError as e: 47 s =3D e.java_exception.toString() /root/spark/python/lib/py4j-0.9-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name) 306 raise Py4JJavaError( 307 "An error occurred while calling {0}{1}{2}.\n". --> 308 format(target_id, ".", name), value) 309 else: 310 raise Py4JError( Py4JJavaError: An error occurred while calling None.org.apache.spark.sql.hive.HiveContext. : java.lang.RuntimeException: org.apache.hadoop.fs.FileAlreadyExistsException: Parent path is not a directory: /tmp tmp at=20 org.apache.hadoop.hdfs.server.namenode.FSDirectory.mkdirs(FSDirectory.java:= 1 489) at=20 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesy= s tem.java:2979) at=20 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.= j ava:2932) at=20 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.jav= a :2911) at=20 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpc= S erver.java:649) at=20 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslato= r PB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:417) at=20 org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNa= m enodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44096) at=20 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(Prot= o bufRpcEngine.java:453) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1695) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1691) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at=20 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.j= a va:1408) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1689) at=20 org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522) at=20 org.apache.spark.sql.hive.client.ClientWrapper.(ClientWrapper.scala:2= 0 4) at=20 org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(Isolated= C lientLoader.scala:238) at=20 org.apache.spark.sql.hive.HiveContext.executionHive$lzycompute(HiveContext.= s cala:218) at=20 org.apache.spark.sql.hive.HiveContext.executionHive(HiveContext.scala:208) at=20 org.apache.spark.sql.hive.HiveContext.functionRegistry$lzycompute(HiveConte= x t.scala:462) at=20 org.apache.spark.sql.hive.HiveContext.functionRegistry(HiveContext.scala:46= 1 ) at org.apache.spark.sql.UDFRegistration.(UDFRegistration.scala:40) at org.apache.spark.sql.SQLContext.(SQLContext.scala:330) at org.apache.spark.sql.hive.HiveContext.(HiveContext.scala:90) at org.apache.spark.sql.hive.HiveContext.(HiveContext.scala:101) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at=20 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAcce= s sorImpl.java:62) at=20 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstru= c torAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:234) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381) at py4j.Gateway.invoke(Gateway.java:214) at=20 py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:= 7 9) at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:68) at py4j.GatewayConnection.run(GatewayConnection.java:209) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.hadoop.fs.FileAlreadyExistsException: Parent path is not a directory: /tmp tmp at=20 org.apache.hadoop.hdfs.server.namenode.FSDirectory.mkdirs(FSDirectory.java:= 1 489) at=20 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesy= s tem.java:2979) at=20 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.= j ava:2932) at=20 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.jav= a :2911) at=20 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpc= S erver.java:649) at=20 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslato= r PB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:417) at=20 org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNa= m enodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44096) at=20 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(Prot= o bufRpcEngine.java:453) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1695) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1691) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at=20 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.j= a va:1408) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1689) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at=20 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAcce= s sorImpl.java:62) at=20 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstru= c torAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at=20 org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.= j ava:90) at=20 org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException= . java:57) at org.apache.hadoop.hdfs.DFSClient.primitiveMkdir(DFSClient.java:2110) at org.apache.hadoop.hdfs.DFSClient.mkdirs(DFSClient.java:2079) at=20 org.apache.hadoop.hdfs.DistributedFileSystem.mkdirs(DistributedFileSystem.j= a va:543) at=20 org.apache.hadoop.hive.ql.exec.Utilities.createDirsWithPermission(Utilities= . java:3679) at=20 org.apache.hadoop.hive.ql.session.SessionState.createRootHDFSDir(SessionSta= t e.java:597) at=20 org.apache.hadoop.hive.ql.session.SessionState.createSessionDirs(SessionSta= t e.java:554) at=20 org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:508) ... 21 more Caused by:=20 org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.fs.FileAlreadyExist= s Exception): Parent path is not a directory: /tmp tmp at=20 org.apache.hadoop.hdfs.server.namenode.FSDirectory.mkdirs(FSDirectory.java:= 1 489) at=20 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesy= s tem.java:2979) at=20 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.= j ava:2932) at=20 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.jav= a :2911) at=20 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpc= S erver.java:649) at=20 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslato= r PB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:417) at=20 org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNa= m enodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44096) at=20 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(Prot= o bufRpcEngine.java:453) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1695) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1691) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at=20 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.j= a va:1408) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1689) at org.apache.hadoop.ipc.Client.call(Client.java:1225) at=20 org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.ja= v a:202) at com.sun.proxy.$Proxy21.mkdirs(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at=20 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:6= 2 ) at=20 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImp= l .java:43) at java.lang.reflect.Method.invoke(Method.java:498) at=20 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocat= i onHandler.java:164) at=20 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHan= d ler.java:83) at com.sun.proxy.$Proxy21.mkdirs(Unknown Source) at=20 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.mkdirs= ( ClientNamenodeProtocolTranslatorPB.java:425) at org.apache.hadoop.hdfs.DFSClient.primitiveMkdir(DFSClient.java:2108) ... 27 more --B_3554376979_1273284 Content-type: text/html; charset="UTF-8" Content-transfer-encoding: quoted-printable
For unknown reason I can = not create UDF when I run the attached notebook on my cluster. I get the fol= lowing error

Py4JJavaError: An error occurred whil=
e calling None.org.apache.spark.sql.hive.HiveContext.
: java.lang.RuntimeException: org.apache.hadoop.fs.FileAlreadyExistsExcepti=
on: Parent path is not a directory: /tmp tmp

The notebook runs fine on my Mac

In general I am able to run non UDF spark code with out any trou= ble

I start the notebook server as the user= “ec2-user" and uses master URL 
spark://ec2-51-215-120-63.us-west-1.compute.amazonaws.com:6066


I found = the following message in the notebook server log file. I have log level set = to warn

16/08/18 21:38:45 WARN ObjectStore: Version information not fou= nd in metastore. hive.metastore.schema.verification is not enabled so record= ing the schema version 1.2.0

16/08/18 21:38:45 WARN ObjectStore: Failed to ge= t database default, returning NoSuchObjectException


=

The cluste= r was originally created using spark-1.6.1-bin-hadoop2.6/ec2/= spark-ec2


#from pyspark.sql import SQLCon=
text, HiveContext
#sqlContext =3D SQLContext(sc)
=E2=80=8B
#from pyspar=
k.sql import functions
=E2=80=8B<=
/span>
=
from pyspark.sql.types import StringType
fro=
m pyspark.sql.<=
span class=3D"cm-variable" style=3D"box-sizing: border-box;">functions import udf
=E2=80=8B
"spark version: {}".format(sc.version))
=E2=80=8B
import sys
print("pyth=
on version: {}".format(sys.version))
<= div style=3D"box-sizing: border-box; position: absolute; height: 17px; width: = 1px; top: 224px;">
<= div class=3D"output_subarea output_text output_stream output_stdout" style=3D"bo= x-sizing: border-box; overflow-x: auto; padding: 0.4em; -webkit-box-flex: 1;= flex: 1 1 0%; max-width: calc(100% - 14ex); line-height: 1.21429em;">
spark ve=
rsion: 1.6.1
python version: 3.4.3 (default, Apr  1 2015, 18:10:40) 
[GCC 4.8.2 20140120 (Red Hat 4.8.2-16)]
=



# functions.lower() raises # py4j.Py4JException: Method lower([class java.lang.Stri= ng]) does not exist # work around define a UDF toLowerUDFRetType =3D StringType()
#toLowerUDF =3D udf(lambda s : s.lower(), toLowerUDFRetTyp= e) toLowerUDF =3D <= span class=3D"n" style=3D"box-sizing: border-box;">udf(lambda s : s= .lower(), StringType())
You must build Spark with Hive. Export 'SPARK_HIVE=3Dtrue' and run =
build/sbt assembly
Py4JJavaErrorTraceback (=
most recent call last)
<ipython-input-2-2e0f7c0bb4=
f9> in <module>()
      4 toLowerUDFRetType =3D StringType()
      5 #toLower=
UDF =3D udf(lambda s : s.lower(), toLowerUDFRetType)
----> 6 toLowerUDF =3D udf(lambda=
 s : s.lower()<=
/span>, StringType())

/root/spark/python/pyspark/sql=
/functions.py in udf(f, returnType)
   1595     =
[Row(slen=3D5), Row(slen=3D3)]
   1596     """
-> 1597     return UserDefinedFunction(f)
   1598 
   1599 blacklist =3D ['map', , 'ignore_unicode_prefix']

/root/spark/python/pyspark/sql=
/functions.py in __init__(self, func, returnType, name)
   1556         self.returnType =3D=
 returnType
   1557         self._broadcast =3D=
 None
-> 1558         self._judf =3D self.<=
/span>_create_judf(na=
me)
   1559 
   1560     def=
 _create_judf(=
self, name):

/root/spark/python/pyspark/sql=
/functions.py in _create_judf(self, name)
   1567         pickled_command, broadcast_vars, env,<=
/span> includes =3D _pr=
epare_for_python_RDD(=
sc, command, self)
   1568         ctx =3D SQLContext.=
getOrCreate(sc=
)
-> 1569         jdt =3D ctx._ssql_ctx=
.parseDataType(self.returnType.json())
   1570         if name is None:
   1571             name =3D f.__name__ if hasattr(f, '__name__') else f.__class__.__name__

/root/spark/python/pyspark/sql=
/context.py in _ssql_ctx(self)
    681         try:
    682             if not hasat=
tr(self, '_scala_HiveContext'):
--> 683                 self._scala_HiveContext =3D self._get_hive_ctx()=

    684             return self._scala_HiveContext
    685         except Py4JError as e:

/root/spark/python/pyspark/sql=
/context.py in _get_hive_ctx(self)
    690 
    691     def=
 _get_hive_ctx(self):
--> 692         return self._jvm.HiveContext(self._jsc.sc()<=
span class=3D"ansi-yellow-intense-fg ansi-bold" style=3D"box-sizing: border-box;=
 color: rgb(178, 125, 18); font-weight: bold;">)
    693 
    694     def=
 refreshTable(=
self, tableName):

/root/spark/python/lib/py4j-0.=
9-src.zip/py4j/java_gateway.py in __call__(self, *args)
   1062         answer =3D self._gateway_client.s=
end_command(command)
   1063         return_value =3D get_return_value(
-> 1064             answer, self._gateway_client=
, None, self._fqn)
   1065 
   1066         for temp_arg in =
temp_args:

/root/spark/python/pyspark/sql=
/utils.py in deco(*a, **kw)
     43     def=
 deco(*a, ):
     44         try:
---> 45             return f(*a=
, **kw)
     46         except py4j.p=
rotocol.Py4JJavaError=
 as e:
     47             s =3D e.=
java_exception.toStri=
ng()

/root/spark/python/lib/py4j-0.=
9-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, na=
me)
    306                 raise Py4JJavaError(
    307                     "An error occurred while calling {0}{1}{2}.\n".
--> 308                     format(target_id, ".=
", name), value)
    309             else:
    310                 raise Py4JError(

Py4JJavaError: An error =
occurred while calling None.org.apache.spark.sql.hive.HiveContext.
: java.lang.RuntimeException: org.apache.hadoop.fs.FileAlreadyExistsExcepti=
on: Parent path is not a directory: /tmp tmp
	at org.apache.hadoop.hdfs.server.namenode.FSDirectory.mkdirs(FSDirectory.j=
ava:1489)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNa=
mesystem.java:2979)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesys=
tem.java:2932)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem=
.java:2911)
	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNod=
eRpcServer.java:649)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTrans=
latorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:417)
	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$Clie=
ntNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44=
096)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(=
ProtobufRpcEngine.java:453)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1695)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1691)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformati=
on.java:1408)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1689)

	at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:=
522)
	at org.apache.spark.sql.hive.client.ClientWrapper.<init>(ClientWrapp=
er.scala:204)
	at org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(Isol=
atedClientLoader.scala:238)
	at org.apache.spark.sql.hive.HiveContext.executionHive$lzycompute(HiveCont=
ext.scala:218)
	at org.apache.spark.sql.hive.HiveContext.executionHive(HiveContext.scala:2=
08)
	at org.apache.spark.sql.hive.HiveContext.functionRegistry$lzycompute(HiveC=
ontext.scala:462)
	at org.apache.spark.sql.hive.HiveContext.functionRegistry(HiveContext.scal=
a:461)
	at org.apache.spark.sql.UDFRegistration.<init>(UDFRegistration.scala=
:40)
	at org.apache.spark.sql.SQLContext.<init>(SQLContext.scala:330)
	at org.apache.spark.sql.hive.HiveContext.<init>(HiveContext.scala:90=
)
	at org.apache.spark.sql.hive.HiveContext.<init>(HiveContext.scala:10=
1)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructor=
AccessorImpl.java:62)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingCon=
structorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:234)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
	at py4j.Gateway.invoke(Gateway.java:214)
	at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.j=
ava:79)
	at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:68)
	at py4j.GatewayConnection.run(GatewayConnection.java:209)
	at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.hadoop.fs.FileAlreadyExistsException: Parent path is =
not a directory: /tmp tmp
	at org.apache.hadoop.hdfs.server.namenode.FSDirectory.mkdirs(FSDirectory.j=
ava:1489)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNa=
mesystem.java:2979)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesys=
tem.java:2932)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem=
.java:2911)
	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNod=
eRpcServer.java:649)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTrans=
latorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:417)
	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$Clie=
ntNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44=
096)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(=
ProtobufRpcEngine.java:453)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1695)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1691)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformati=
on.java:1408)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1689)

	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructor=
AccessorImpl.java:62)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingCon=
structorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
	at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteExcept=
ion.java:90)
	at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteExcep=
tion.java:57)
	at org.apache.hadoop.hdfs.DFSClient.primitiveMkdir(DFSClient.java:2110)
	at org.apache.hadoop.hdfs.DFSClient.mkdirs(DFSClient.java:2079)
	at org.apache.hadoop.hdfs.DistributedFileSystem.mkdirs(DistributedFileSyst=
em.java:543)
	at org.apache.hadoop.hive.ql.exec.Utilities.createDirsWithPermission(Utili=
ties.java:3679)
	at org.apache.hadoop.hive.ql.session.SessionState.createRootHDFSDir(Sessio=
nState.java:597)
	at org.apache.hadoop.hive.ql.session.SessionState.createSessionDirs(Sessio=
nState.java:554)
	at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:=
508)
	... 21 more
Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.fs.FileA=
lreadyExistsException): Parent path is not a directory: /tmp tmp
	at org.apache.hadoop.hdfs.server.namenode.FSDirectory.mkdirs(FSDirectory.j=
ava:1489)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNa=
mesystem.java:2979)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesys=
tem.java:2932)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem=
.java:2911)
	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNod=
eRpcServer.java:649)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTrans=
latorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:417)
	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$Clie=
ntNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44=
096)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(=
ProtobufRpcEngine.java:453)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1695)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1691)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformati=
on.java:1408)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1689)

	at org.apache.hadoop.ipc.Client.call(Client.java:1225)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngin=
e.java:202)
	at com.sun.proxy.$Proxy21.mkdirs(Unknown Source)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.ja=
va:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccesso=
rImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInv=
ocationHandler.java:164)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocatio=
nHandler.java:83)
	at com.sun.proxy.$Proxy21.mkdirs(Unknown Source)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.mk=
dirs(ClientNamenodeProtocolTranslatorPB.java:425)
	at org.apache.hadoop.hdfs.DFSClient.primitiveMkdir(DFSClient.java:2108)
	... 27 more


--B_3554376979_1273284-- --B_3554376979_1239654 Content-type: application/octet-stream; name="udfBug.ipynb" Content-disposition: attachment; filename="udfBug.ipynb" Content-transfer-encoding: base64 ewogImNlbGxzIjogWwogIHsKICAgImNlbGxfdHlwZSI6ICJtYXJrZG93biIsCiAgICJtZXRh ZGF0YSI6IHt9LAogICAic291cmNlIjogWwogICAgIiMgVURGIEJ1Z1xuIgogICBdCiAgfSwK ICB7CiAgICJjZWxsX3R5cGUiOiAiY29kZSIsCiAgICJleGVjdXRpb25fY291bnQiOiAxLAog ICAibWV0YWRhdGEiOiB7CiAgICAiY29sbGFwc2VkIjogZmFsc2UKICAgfSwKICAgIm91dHB1 dHMiOiBbCiAgICB7CiAgICAgIm5hbWUiOiAic3Rkb3V0IiwKICAgICAib3V0cHV0X3R5cGUi OiAic3RyZWFtIiwKICAgICAidGV4dCI6IFsKICAgICAgInNwYXJrIHZlcnNpb246IDEuNi4x XG4iLAogICAgICAicHl0aG9uIHZlcnNpb246IDMuNC4yICh2My40LjI6YWIyYzAyM2E5NDMy LCBPY3QgIDUgMjAxNCwgMjA6NDI6MjIpIFxuIiwKICAgICAgIltHQ0MgNC4yLjEgKEFwcGxl IEluYy4gYnVpbGQgNTY2NikgKGRvdCAzKV1cbiIKICAgICBdCiAgICB9CiAgIF0sCiAgICJz b3VyY2UiOiBbCiAgICAiI2Zyb20gcHlzcGFyay5zcWwgaW1wb3J0IFNRTENvbnRleHQsIEhp dmVDb250ZXh0XG4iLAogICAgIiNzcWxDb250ZXh0ID0gU1FMQ29udGV4dChzYylcbiIsCiAg ICAiXG4iLAogICAgIiNmcm9tIHB5c3Bhcmsuc3FsIGltcG9ydCBEYXRhRnJhbWVcbiIsCiAg ICAiI2Zyb20gcHlzcGFyay5zcWwgaW1wb3J0IGZ1bmN0aW9uc1xuIiwKICAgICJcbiIsCiAg ICAiZnJvbSBweXNwYXJrLnNxbC50eXBlcyBpbXBvcnQgU3RyaW5nVHlwZVxuIiwKICAgICJm cm9tIHB5c3Bhcmsuc3FsLmZ1bmN0aW9ucyBpbXBvcnQgdWRmXG4iLAogICAgIlxuIiwKICAg ICJwcmludChcInNwYXJrIHZlcnNpb246IHt9XCIuZm9ybWF0KHNjLnZlcnNpb24pKVxuIiwK ICAgICJcbiIsCiAgICAiaW1wb3J0IHN5c1xuIiwKICAgICJwcmludChcInB5dGhvbiB2ZXJz aW9uOiB7fVwiLmZvcm1hdChzeXMudmVyc2lvbikpIgogICBdCiAgfSwKICB7CiAgICJjZWxs X3R5cGUiOiAiY29kZSIsCiAgICJleGVjdXRpb25fY291bnQiOiAyLAogICAibWV0YWRhdGEi OiB7CiAgICAiY29sbGFwc2VkIjogZmFsc2UsCiAgICAic2Nyb2xsZWQiOiBmYWxzZQogICB9 LAogICAib3V0cHV0cyI6IFtdLAogICAic291cmNlIjogWwogICAgIiMgZnVuY3Rpb25zLmxv d2VyKCkgcmFpc2VzIFxuIiwKICAgICIjIHB5NGouUHk0SkV4Y2VwdGlvbjogTWV0aG9kIGxv d2VyKFtjbGFzcyBqYXZhLmxhbmcuU3RyaW5nXSkgZG9lcyBub3QgZXhpc3RcbiIsCiAgICAi IyB3b3JrIGFyb3VuZCBkZWZpbmUgYSBVREZcbiIsCiAgICAidG9Mb3dlclVERlJldFR5cGUg PSBTdHJpbmdUeXBlKClcbiIsCiAgICAiI3RvTG93ZXJVREYgPSB1ZGYobGFtYmRhIHMgOiBz Lmxvd2VyKCksIHRvTG93ZXJVREZSZXRUeXBlKVxuIiwKICAgICJ0b0xvd2VyVURGID0gdWRm KGxhbWJkYSBzIDogcy5sb3dlcigpLCBTdHJpbmdUeXBlKCkpIgogICBdCiAgfQogXSwKICJt ZXRhZGF0YSI6IHsKICAia2VybmVsc3BlYyI6IHsKICAgImRpc3BsYXlfbmFtZSI6ICJQeXRo b24gMyIsCiAgICJsYW5ndWFnZSI6ICJweXRob24iLAogICAibmFtZSI6ICJweXRob24zIgog IH0sCiAgImxhbmd1YWdlX2luZm8iOiB7CiAgICJjb2RlbWlycm9yX21vZGUiOiB7CiAgICAi bmFtZSI6ICJpcHl0aG9uIiwKICAgICJ2ZXJzaW9uIjogMwogICB9LAogICAiZmlsZV9leHRl bnNpb24iOiAiLnB5IiwKICAgIm1pbWV0eXBlIjogInRleHQveC1weXRob24iLAogICAibmFt ZSI6ICJweXRob24iLAogICAibmJjb252ZXJ0X2V4cG9ydGVyIjogInB5dGhvbiIsCiAgICJw eWdtZW50c19sZXhlciI6ICJpcHl0aG9uMyIsCiAgICJ2ZXJzaW9uIjogIjMuNC4yIgogIH0K IH0sCiAibmJmb3JtYXQiOiA0LAogIm5iZm9ybWF0X21pbm9yIjogMAp9Cg== --B_3554376979_1239654 Content-Type: text/plain; charset=us-ascii --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscribe@spark.apache.org --B_3554376979_1239654--