Return-Path: X-Original-To: apmail-spark-user-archive@minotaur.apache.org Delivered-To: apmail-spark-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 6CCF31847A for ; Sun, 18 Oct 2015 15:50:50 +0000 (UTC) Received: (qmail 3574 invoked by uid 500); 18 Oct 2015 15:50:47 -0000 Delivered-To: apmail-spark-user-archive@spark.apache.org Received: (qmail 3457 invoked by uid 500); 18 Oct 2015 15:50:47 -0000 Mailing-List: contact user-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list user@spark.apache.org Received: (qmail 3447 invoked by uid 99); 18 Oct 2015 15:50:47 -0000 Received: from Unknown (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 18 Oct 2015 15:50:47 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id B3C011A0936 for ; Sun, 18 Oct 2015 15:50:46 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 4.214 X-Spam-Level: **** X-Spam-Status: No, score=4.214 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=3, URIBL_BLOCKED=0.001, URI_HEX=1.313] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-us-east.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id Fb-I9c5aVsB8 for ; Sun, 18 Oct 2015 15:50:35 +0000 (UTC) Received: from mail-yk0-f177.google.com (mail-yk0-f177.google.com [209.85.160.177]) by mx1-us-east.apache.org (ASF Mail Server at mx1-us-east.apache.org) with ESMTPS id B6A9243A13 for ; Sun, 18 Oct 2015 15:50:34 +0000 (UTC) Received: by ykdz2 with SMTP id z2so38219852ykd.3 for ; Sun, 18 Oct 2015 08:50:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=UUMfiTtMn8p7fbPThbXasHcHajRuS1vWEYJhhuskYO8=; b=zpysjRvz2oqUzLFC6Zkf8hl0InkGlQFUN+NYRkBIKhub5p+rTo8g+FZSTpi4qQ02Z6 uwi/MkbC/P8qrjKZ2YppY6EU1vQDhuq/ZNzdMKtd7ATbZ/zEGfH9wDiGZ8UqPqC/DARU KC0r7TGtLI6TQfYwBwhapJ3dsvrZ1JHWzOpYuHUMMA57COqoMItgrWNzN6TfYTO1bNDZ t1Hk+/qvS97LHJLHAzyMB0cs75SWA843M2fc5bKkIGkwZxou5lqdI9Kn8Hhpl1ukzUDb 9bsQHgJ5myt0r8ibZk3q7iu9k1eL5JBDQBSbKvit5g0ubDiyAlGW0l2Wz7icWVQg4F3w F3eQ== MIME-Version: 1.0 X-Received: by 10.129.0.8 with SMTP id 8mr17200907ywa.218.1445183434350; Sun, 18 Oct 2015 08:50:34 -0700 (PDT) Received: by 10.37.216.145 with HTTP; Sun, 18 Oct 2015 08:50:34 -0700 (PDT) In-Reply-To: <1445152244061-25111.post@n3.nabble.com> References: <1445152244061-25111.post@n3.nabble.com> Date: Sun, 18 Oct 2015 08:50:34 -0700 Message-ID: Subject: Re: callUdf("percentile_approx",col("mycol"),lit(0.25)) does not compile spark 1.5.1 source but it does work in spark 1.5.1 bin From: Ted Yu To: unk1102 Cc: user Content-Type: multipart/alternative; boundary=001a1140c11a3d6bbd052262fdcb --001a1140c11a3d6bbd052262fdcb Content-Type: text/plain; charset=UTF-8 The udf is defined in GenericUDAFPercentileApprox of hive. When spark-shell runs, it has access to the above class which is packaged in assembly/target/scala-2.10/spark-assembly-1.6.0-SNAPSHOT-hadoop2.7.0.jar : 2143 Fri Oct 16 15:02:26 PDT 2015 org/apache/hadoop/hive/ql/udf/generic/GenericUDAFPercentileApprox$1.class 4602 Fri Oct 16 15:02:26 PDT 2015 org/apache/hadoop/hive/ql/udf/generic/GenericUDAFPercentileApprox$GenericUDAFMultiplePercentileApproxEvaluator.class 1697 Fri Oct 16 15:02:26 PDT 2015 org/apache/hadoop/hive/ql/udf/generic/GenericUDAFPercentileApprox$GenericUDAFPercentileApproxEvaluator$PercentileAggBuf.class 6570 Fri Oct 16 15:02:26 PDT 2015 org/apache/hadoop/hive/ql/udf/generic/GenericUDAFPercentileApprox$GenericUDAFPercentileApproxEvaluator.class 4334 Fri Oct 16 15:02:26 PDT 2015 org/apache/hadoop/hive/ql/udf/generic/GenericUDAFPercentileApprox$GenericUDAFSinglePercentileApproxEvaluator.class 6293 Fri Oct 16 15:02:26 PDT 2015 org/apache/hadoop/hive/ql/udf/generic/GenericUDAFPercentileApprox.class That was the cause for different behavior. FYI On Sun, Oct 18, 2015 at 12:10 AM, unk1102 wrote: > Hi starting new thread following old thread looks like code for compiling > callUdf("percentile_approx",col("mycol"),lit(0.25)) is not merged in spark > 1.5.1 source but I dont understand why this function call works in Spark > 1.5.1 spark-shell/bin. Please guide. > > ---------- Forwarded message ---------- > From: "Ted Yu" > Date: Oct 14, 2015 3:26 AM > Subject: Re: How to calculate percentile of a column of DataFrame? > To: "Umesh Kacha" > Cc: "Michael Armbrust" , > "<Saif.A.Ellafi@wellsfargo.com>" , > "user" > > I modified DataFrameSuite, in master branch, to call percentile_approx > instead of simpleUDF : > > - deprecated callUdf in SQLContext > - callUDF in SQLContext *** FAILED *** > org.apache.spark.sql.AnalysisException: undefined function > percentile_approx; > at > > org.apache.spark.sql.catalyst.analysis.SimpleFunctionRegistry$$anonfun$2.apply(FunctionRegistry.scala:64) > at > > org.apache.spark.sql.catalyst.analysis.SimpleFunctionRegistry$$anonfun$2.apply(FunctionRegistry.scala:64) > at scala.Option.getOrElse(Option.scala:120) > at > > org.apache.spark.sql.catalyst.analysis.SimpleFunctionRegistry.lookupFunction(FunctionRegistry.scala:63) > at > > org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$10$$anonfun$applyOrElse$5$$anonfun$applyOrElse$24.apply(Analyzer.scala:506) > at > > org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$10$$anonfun$applyOrElse$5$$anonfun$applyOrElse$24.apply(Analyzer.scala:506) > at > > org.apache.spark.sql.catalyst.analysis.package$.withPosition(package.scala:48) > at > > org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$10$$anonfun$applyOrElse$5.applyOrElse(Analyzer.scala:505) > at > > org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$10$$anonfun$applyOrElse$5.applyOrElse(Analyzer.scala:502) > at > > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:227) > > SPARK-10671 is included. > For 1.5.1, I guess the absence of SPARK-10671 means that SparkSQL treats > percentile_approx as normal UDF. > > Experts can correct me, if there is any misunderstanding. > > Cheers > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/callUdf-percentile-approx-col-mycol-lit-0-25-does-not-compile-spark-1-5-1-source-but-it-does-work-inn-tp25111.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscribe@spark.apache.org > For additional commands, e-mail: user-help@spark.apache.org > > --001a1140c11a3d6bbd052262fdcb Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
The udf is defined in=C2=A0GenericUDAFPercentileApprox of = hive.

When spark-shell runs, it has access to the above = class which is packaged in=C2=A0assembly/target/scala-2.10/spark-assembly-1= .6.0-SNAPSHOT-hadoop2.7.0.jar :

=C2=A0 2143 F= ri Oct 16 15:02:26 PDT 2015 org/apache/hadoop/hive/ql/udf/generic/GenericUD= AFPercentileApprox$1.class
=C2=A0 4602 Fri Oct 16 15:02:26 PDT 20= 15 org/apache/hadoop/hive/ql/udf/generic/GenericUDAFPercentileApprox$Generi= cUDAFMultiplePercentileApproxEvaluator.class
=C2=A0 1697 Fri Oct = 16 15:02:26 PDT 2015 org/apache/hadoop/hive/ql/udf/generic/GenericUDAFPerce= ntileApprox$GenericUDAFPercentileApproxEvaluator$PercentileAggBuf.class
=C2=A0 6570 Fri Oct 16 15:02:26 PDT 2015 org/apache/hadoop/hive/ql/u= df/generic/GenericUDAFPercentileApprox$GenericUDAFPercentileApproxEvaluator= .class
=C2=A0 4334 Fri Oct 16 15:02:26 PDT 2015 org/apache/hadoop= /hive/ql/udf/generic/GenericUDAFPercentileApprox$GenericUDAFSinglePercentil= eApproxEvaluator.class
=C2=A0 6293 Fri Oct 16 15:02:26 PDT 2015 o= rg/apache/hadoop/hive/ql/udf/generic/GenericUDAFPercentileApprox.class

That was the cause for different behavior.
=

FYI

On Sun, Oct 18, 2015 at 12:10 AM, unk1102 <umesh.kac= ha@gmail.com> wrote:
Hi sta= rting new thread following old thread looks like code for compiling
callUdf("percentile_approx",col("mycol"),lit(0.25)) is = not merged in spark
1.5.1 source but I dont understand why this function call works in Spark 1.5.1 spark-shell/bin. Please guide.

---------- Forwarded message ----------
From: "Ted Yu" <yuzhiho= ng@gmail.com>
Date: Oct 14, 2015 3:26 AM
Subject: Re: How to calculate percentile of a column of DataFrame?
To: "Umesh Kacha" <um= esh.kacha@gmail.com>
Cc: "Michael Armbrust" <michael@databricks.com>,
"&lt;Saif.A.E= llafi@wellsfargo.com&gt;" <Saif.A.Ellafi@wellsfargo.com>,
"user" <user@spark.ap= ache.org>

I modified DataFrameSuite, in master branch, to call percentile_approx
instead of simpleUDF :

- deprecated callUdf in SQLContext
- callUDF in SQLContext *** FAILED ***
=C2=A0 org.apache.spark.sql.AnalysisException: undefined function
percentile_approx;
=C2=A0 at
org.apache.spark.sql.catalyst.analysis.SimpleFunctionRegistry$$anonfun$2.ap= ply(FunctionRegistry.scala:64)
=C2=A0 at
org.apache.spark.sql.catalyst.analysis.SimpleFunctionRegistry$$anonfun$2.ap= ply(FunctionRegistry.scala:64)
=C2=A0 at scala.Option.getOrElse(Option.scala:120)
=C2=A0 at
org.apache.spark.sql.catalyst.analysis.SimpleFunctionRegistry.lookupFunctio= n(FunctionRegistry.scala:63)
=C2=A0 at
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$a= pply$10$$anonfun$applyOrElse$5$$anonfun$applyOrElse$24.apply(Analyzer.scala= :506)
=C2=A0 at
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$a= pply$10$$anonfun$applyOrElse$5$$anonfun$applyOrElse$24.apply(Analyzer.scala= :506)
=C2=A0 at
org.apache.spark.sql.catalyst.analysis.package$.withPosition(package.scala:= 48)
=C2=A0 at
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$a= pply$10$$anonfun$applyOrElse$5.applyOrElse(Analyzer.scala:505)
=C2=A0 at
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$a= pply$10$$anonfun$applyOrElse$5.applyOrElse(Analyzer.scala:502)
=C2=A0 at
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scal= a:227)

SPARK-10671 is included.
For 1.5.1, I guess the absence of SPARK-10671 means that SparkSQL treats percentile_approx as normal UDF.

Experts can correct me, if there is any misunderstanding.

Cheers



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/ca= llUdf-percentile-approx-col-mycol-lit-0-25-does-not-compile-spark-1-5-1-sou= rce-but-it-does-work-inn-tp25111.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


--001a1140c11a3d6bbd052262fdcb--