spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Umesh Kacha <umesh.ka...@gmail.com>
Subject Re: callUdf("percentile_approx",col("mycol"),lit(0.25)) does not compile spark 1.5.1 source but it does work in spark 1.5.1 bin
Date Mon, 19 Oct 2015 11:00:25 GMT
Hi Ted thanks much for your help really appreciate it. I tried to use maven
dependencies you mentioned but still callUdf is not compiling please find
snap shot of my intellij editor. I am sorry you may have to zoom pictures
as I can't share code. Thanks again.
On Oct 19, 2015 8:32 AM, "Ted Yu" <yuzhihong@gmail.com> wrote:

> Umesh:
>
> $ jar tvf
> /home/hbase/.m2/repository/org/spark-project/hive/hive-exec/1.2.1.spark/hive-exec-1.2.1.spark.jar
> | grep GenericUDAFPercentile
>   2143 Fri Jul 31 23:51:48 PDT 2015
> org/apache/hadoop/hive/ql/udf/generic/GenericUDAFPercentileApprox$1.class
>   4602 Fri Jul 31 23:51:48 PDT 2015
> org/apache/hadoop/hive/ql/udf/generic/GenericUDAFPercentileApprox$GenericUDAFMultiplePercentileApproxEvaluator.class
>
> As long as the following dependency is in your pom.xml:
> [INFO] +- org.spark-project.hive:hive-exec:jar:1.2.1.spark:compile
>
> You should be able to invoke percentile_approx
>
> Cheers
>
> On Sun, Oct 18, 2015 at 8:58 AM, Umesh Kacha <umesh.kacha@gmail.com>
> wrote:
>
>> Thanks much Ted so when do we get to use this sparkUdf in Java code using
>> maven code dependencies?? You said JIRA 10671 is not pushed as part of
>> 1.5.1 so it should be released in 1.6.0 as mentioned in the JIRA right?
>>
>> On Sun, Oct 18, 2015 at 9:20 PM, Ted Yu <yuzhihong@gmail.com> wrote:
>>
>>> The udf is defined in GenericUDAFPercentileApprox of hive.
>>>
>>> When spark-shell runs, it has access to the above class which is
>>> packaged
>>> in assembly/target/scala-2.10/spark-assembly-1.6.0-SNAPSHOT-hadoop2.7.0.jar
>>> :
>>>
>>>   2143 Fri Oct 16 15:02:26 PDT 2015
>>> org/apache/hadoop/hive/ql/udf/generic/GenericUDAFPercentileApprox$1.class
>>>   4602 Fri Oct 16 15:02:26 PDT 2015
>>> org/apache/hadoop/hive/ql/udf/generic/GenericUDAFPercentileApprox$GenericUDAFMultiplePercentileApproxEvaluator.class
>>>   1697 Fri Oct 16 15:02:26 PDT 2015
>>> org/apache/hadoop/hive/ql/udf/generic/GenericUDAFPercentileApprox$GenericUDAFPercentileApproxEvaluator$PercentileAggBuf.class
>>>   6570 Fri Oct 16 15:02:26 PDT 2015
>>> org/apache/hadoop/hive/ql/udf/generic/GenericUDAFPercentileApprox$GenericUDAFPercentileApproxEvaluator.class
>>>   4334 Fri Oct 16 15:02:26 PDT 2015
>>> org/apache/hadoop/hive/ql/udf/generic/GenericUDAFPercentileApprox$GenericUDAFSinglePercentileApproxEvaluator.class
>>>   6293 Fri Oct 16 15:02:26 PDT 2015
>>> org/apache/hadoop/hive/ql/udf/generic/GenericUDAFPercentileApprox.class
>>>
>>> That was the cause for different behavior.
>>>
>>> FYI
>>>
>>> On Sun, Oct 18, 2015 at 12:10 AM, unk1102 <umesh.kacha@gmail.com> wrote:
>>>
>>>> Hi starting new thread following old thread looks like code for
>>>> compiling
>>>> callUdf("percentile_approx",col("mycol"),lit(0.25)) is not merged in
>>>> spark
>>>> 1.5.1 source but I dont understand why this function call works in Spark
>>>> 1.5.1 spark-shell/bin. Please guide.
>>>>
>>>> ---------- Forwarded message ----------
>>>> From: "Ted Yu" <yuzhihong@gmail.com>
>>>> Date: Oct 14, 2015 3:26 AM
>>>> Subject: Re: How to calculate percentile of a column of DataFrame?
>>>> To: "Umesh Kacha" <umesh.kacha@gmail.com>
>>>> Cc: "Michael Armbrust" <michael@databricks.com>,
>>>> "&lt;Saif.A.Ellafi@wellsfargo.com&gt;" <Saif.A.Ellafi@wellsfargo.com>,
>>>> "user" <user@spark.apache.org>
>>>>
>>>> I modified DataFrameSuite, in master branch, to call percentile_approx
>>>> instead of simpleUDF :
>>>>
>>>> - deprecated callUdf in SQLContext
>>>> - callUDF in SQLContext *** FAILED ***
>>>>   org.apache.spark.sql.AnalysisException: undefined function
>>>> percentile_approx;
>>>>   at
>>>>
>>>> org.apache.spark.sql.catalyst.analysis.SimpleFunctionRegistry$$anonfun$2.apply(FunctionRegistry.scala:64)
>>>>   at
>>>>
>>>> org.apache.spark.sql.catalyst.analysis.SimpleFunctionRegistry$$anonfun$2.apply(FunctionRegistry.scala:64)
>>>>   at scala.Option.getOrElse(Option.scala:120)
>>>>   at
>>>>
>>>> org.apache.spark.sql.catalyst.analysis.SimpleFunctionRegistry.lookupFunction(FunctionRegistry.scala:63)
>>>>   at
>>>>
>>>> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$10$$anonfun$applyOrElse$5$$anonfun$applyOrElse$24.apply(Analyzer.scala:506)
>>>>   at
>>>>
>>>> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$10$$anonfun$applyOrElse$5$$anonfun$applyOrElse$24.apply(Analyzer.scala:506)
>>>>   at
>>>>
>>>> org.apache.spark.sql.catalyst.analysis.package$.withPosition(package.scala:48)
>>>>   at
>>>>
>>>> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$10$$anonfun$applyOrElse$5.applyOrElse(Analyzer.scala:505)
>>>>   at
>>>>
>>>> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$10$$anonfun$applyOrElse$5.applyOrElse(Analyzer.scala:502)
>>>>   at
>>>>
>>>> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:227)
>>>>
>>>> SPARK-10671 is included.
>>>> For 1.5.1, I guess the absence of SPARK-10671 means that SparkSQL treats
>>>> percentile_approx as normal UDF.
>>>>
>>>> Experts can correct me, if there is any misunderstanding.
>>>>
>>>> Cheers
>>>>
>>>>
>>>>
>>>> --
>>>> View this message in context:
>>>> http://apache-spark-user-list.1001560.n3.nabble.com/callUdf-percentile-approx-col-mycol-lit-0-25-does-not-compile-spark-1-5-1-source-but-it-does-work-inn-tp25111.html
>>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>>>> For additional commands, e-mail: user-help@spark.apache.org
>>>>
>>>>
>>>
>>
>

Mime
View raw message