spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: callUdf("percentile_approx",col("mycol"),lit(0.25)) does not compile spark 1.5.1 source but it does work in spark 1.5.1 bin
Date Mon, 19 Oct 2015 03:02:09 GMT
Umesh:

$ jar tvf
/home/hbase/.m2/repository/org/spark-project/hive/hive-exec/1.2.1.spark/hive-exec-1.2.1.spark.jar
| grep GenericUDAFPercentile
  2143 Fri Jul 31 23:51:48 PDT 2015
org/apache/hadoop/hive/ql/udf/generic/GenericUDAFPercentileApprox$1.class
  4602 Fri Jul 31 23:51:48 PDT 2015
org/apache/hadoop/hive/ql/udf/generic/GenericUDAFPercentileApprox$GenericUDAFMultiplePercentileApproxEvaluator.class

As long as the following dependency is in your pom.xml:
[INFO] +- org.spark-project.hive:hive-exec:jar:1.2.1.spark:compile

You should be able to invoke percentile_approx

Cheers

On Sun, Oct 18, 2015 at 8:58 AM, Umesh Kacha <umesh.kacha@gmail.com> wrote:

> Thanks much Ted so when do we get to use this sparkUdf in Java code using
> maven code dependencies?? You said JIRA 10671 is not pushed as part of
> 1.5.1 so it should be released in 1.6.0 as mentioned in the JIRA right?
>
> On Sun, Oct 18, 2015 at 9:20 PM, Ted Yu <yuzhihong@gmail.com> wrote:
>
>> The udf is defined in GenericUDAFPercentileApprox of hive.
>>
>> When spark-shell runs, it has access to the above class which is packaged
>> in assembly/target/scala-2.10/spark-assembly-1.6.0-SNAPSHOT-hadoop2.7.0.jar
>> :
>>
>>   2143 Fri Oct 16 15:02:26 PDT 2015
>> org/apache/hadoop/hive/ql/udf/generic/GenericUDAFPercentileApprox$1.class
>>   4602 Fri Oct 16 15:02:26 PDT 2015
>> org/apache/hadoop/hive/ql/udf/generic/GenericUDAFPercentileApprox$GenericUDAFMultiplePercentileApproxEvaluator.class
>>   1697 Fri Oct 16 15:02:26 PDT 2015
>> org/apache/hadoop/hive/ql/udf/generic/GenericUDAFPercentileApprox$GenericUDAFPercentileApproxEvaluator$PercentileAggBuf.class
>>   6570 Fri Oct 16 15:02:26 PDT 2015
>> org/apache/hadoop/hive/ql/udf/generic/GenericUDAFPercentileApprox$GenericUDAFPercentileApproxEvaluator.class
>>   4334 Fri Oct 16 15:02:26 PDT 2015
>> org/apache/hadoop/hive/ql/udf/generic/GenericUDAFPercentileApprox$GenericUDAFSinglePercentileApproxEvaluator.class
>>   6293 Fri Oct 16 15:02:26 PDT 2015
>> org/apache/hadoop/hive/ql/udf/generic/GenericUDAFPercentileApprox.class
>>
>> That was the cause for different behavior.
>>
>> FYI
>>
>> On Sun, Oct 18, 2015 at 12:10 AM, unk1102 <umesh.kacha@gmail.com> wrote:
>>
>>> Hi starting new thread following old thread looks like code for compiling
>>> callUdf("percentile_approx",col("mycol"),lit(0.25)) is not merged in
>>> spark
>>> 1.5.1 source but I dont understand why this function call works in Spark
>>> 1.5.1 spark-shell/bin. Please guide.
>>>
>>> ---------- Forwarded message ----------
>>> From: "Ted Yu" <yuzhihong@gmail.com>
>>> Date: Oct 14, 2015 3:26 AM
>>> Subject: Re: How to calculate percentile of a column of DataFrame?
>>> To: "Umesh Kacha" <umesh.kacha@gmail.com>
>>> Cc: "Michael Armbrust" <michael@databricks.com>,
>>> "&lt;Saif.A.Ellafi@wellsfargo.com&gt;" <Saif.A.Ellafi@wellsfargo.com>,
>>> "user" <user@spark.apache.org>
>>>
>>> I modified DataFrameSuite, in master branch, to call percentile_approx
>>> instead of simpleUDF :
>>>
>>> - deprecated callUdf in SQLContext
>>> - callUDF in SQLContext *** FAILED ***
>>>   org.apache.spark.sql.AnalysisException: undefined function
>>> percentile_approx;
>>>   at
>>>
>>> org.apache.spark.sql.catalyst.analysis.SimpleFunctionRegistry$$anonfun$2.apply(FunctionRegistry.scala:64)
>>>   at
>>>
>>> org.apache.spark.sql.catalyst.analysis.SimpleFunctionRegistry$$anonfun$2.apply(FunctionRegistry.scala:64)
>>>   at scala.Option.getOrElse(Option.scala:120)
>>>   at
>>>
>>> org.apache.spark.sql.catalyst.analysis.SimpleFunctionRegistry.lookupFunction(FunctionRegistry.scala:63)
>>>   at
>>>
>>> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$10$$anonfun$applyOrElse$5$$anonfun$applyOrElse$24.apply(Analyzer.scala:506)
>>>   at
>>>
>>> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$10$$anonfun$applyOrElse$5$$anonfun$applyOrElse$24.apply(Analyzer.scala:506)
>>>   at
>>>
>>> org.apache.spark.sql.catalyst.analysis.package$.withPosition(package.scala:48)
>>>   at
>>>
>>> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$10$$anonfun$applyOrElse$5.applyOrElse(Analyzer.scala:505)
>>>   at
>>>
>>> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$10$$anonfun$applyOrElse$5.applyOrElse(Analyzer.scala:502)
>>>   at
>>>
>>> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:227)
>>>
>>> SPARK-10671 is included.
>>> For 1.5.1, I guess the absence of SPARK-10671 means that SparkSQL treats
>>> percentile_approx as normal UDF.
>>>
>>> Experts can correct me, if there is any misunderstanding.
>>>
>>> Cheers
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://apache-spark-user-list.1001560.n3.nabble.com/callUdf-percentile-approx-col-mycol-lit-0-25-does-not-compile-spark-1-5-1-source-but-it-does-work-inn-tp25111.html
>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>>> For additional commands, e-mail: user-help@spark.apache.org
>>>
>>>
>>
>

Mime
View raw message