spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Umesh Kacha <umesh.ka...@gmail.com>
Subject Re: callUdf("percentile_approx",col("mycol"),lit(0.25)) does not compile spark 1.5.1 source but it does work in spark 1.5.1 bin
Date Sun, 18 Oct 2015 15:58:40 GMT
Thanks much Ted so when do we get to use this sparkUdf in Java code using
maven code dependencies?? You said JIRA 10671 is not pushed as part of
1.5.1 so it should be released in 1.6.0 as mentioned in the JIRA right?

On Sun, Oct 18, 2015 at 9:20 PM, Ted Yu <yuzhihong@gmail.com> wrote:

> The udf is defined in GenericUDAFPercentileApprox of hive.
>
> When spark-shell runs, it has access to the above class which is packaged
> in assembly/target/scala-2.10/spark-assembly-1.6.0-SNAPSHOT-hadoop2.7.0.jar
> :
>
>   2143 Fri Oct 16 15:02:26 PDT 2015
> org/apache/hadoop/hive/ql/udf/generic/GenericUDAFPercentileApprox$1.class
>   4602 Fri Oct 16 15:02:26 PDT 2015
> org/apache/hadoop/hive/ql/udf/generic/GenericUDAFPercentileApprox$GenericUDAFMultiplePercentileApproxEvaluator.class
>   1697 Fri Oct 16 15:02:26 PDT 2015
> org/apache/hadoop/hive/ql/udf/generic/GenericUDAFPercentileApprox$GenericUDAFPercentileApproxEvaluator$PercentileAggBuf.class
>   6570 Fri Oct 16 15:02:26 PDT 2015
> org/apache/hadoop/hive/ql/udf/generic/GenericUDAFPercentileApprox$GenericUDAFPercentileApproxEvaluator.class
>   4334 Fri Oct 16 15:02:26 PDT 2015
> org/apache/hadoop/hive/ql/udf/generic/GenericUDAFPercentileApprox$GenericUDAFSinglePercentileApproxEvaluator.class
>   6293 Fri Oct 16 15:02:26 PDT 2015
> org/apache/hadoop/hive/ql/udf/generic/GenericUDAFPercentileApprox.class
>
> That was the cause for different behavior.
>
> FYI
>
> On Sun, Oct 18, 2015 at 12:10 AM, unk1102 <umesh.kacha@gmail.com> wrote:
>
>> Hi starting new thread following old thread looks like code for compiling
>> callUdf("percentile_approx",col("mycol"),lit(0.25)) is not merged in spark
>> 1.5.1 source but I dont understand why this function call works in Spark
>> 1.5.1 spark-shell/bin. Please guide.
>>
>> ---------- Forwarded message ----------
>> From: "Ted Yu" <yuzhihong@gmail.com>
>> Date: Oct 14, 2015 3:26 AM
>> Subject: Re: How to calculate percentile of a column of DataFrame?
>> To: "Umesh Kacha" <umesh.kacha@gmail.com>
>> Cc: "Michael Armbrust" <michael@databricks.com>,
>> "&lt;Saif.A.Ellafi@wellsfargo.com&gt;" <Saif.A.Ellafi@wellsfargo.com>,
>> "user" <user@spark.apache.org>
>>
>> I modified DataFrameSuite, in master branch, to call percentile_approx
>> instead of simpleUDF :
>>
>> - deprecated callUdf in SQLContext
>> - callUDF in SQLContext *** FAILED ***
>>   org.apache.spark.sql.AnalysisException: undefined function
>> percentile_approx;
>>   at
>>
>> org.apache.spark.sql.catalyst.analysis.SimpleFunctionRegistry$$anonfun$2.apply(FunctionRegistry.scala:64)
>>   at
>>
>> org.apache.spark.sql.catalyst.analysis.SimpleFunctionRegistry$$anonfun$2.apply(FunctionRegistry.scala:64)
>>   at scala.Option.getOrElse(Option.scala:120)
>>   at
>>
>> org.apache.spark.sql.catalyst.analysis.SimpleFunctionRegistry.lookupFunction(FunctionRegistry.scala:63)
>>   at
>>
>> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$10$$anonfun$applyOrElse$5$$anonfun$applyOrElse$24.apply(Analyzer.scala:506)
>>   at
>>
>> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$10$$anonfun$applyOrElse$5$$anonfun$applyOrElse$24.apply(Analyzer.scala:506)
>>   at
>>
>> org.apache.spark.sql.catalyst.analysis.package$.withPosition(package.scala:48)
>>   at
>>
>> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$10$$anonfun$applyOrElse$5.applyOrElse(Analyzer.scala:505)
>>   at
>>
>> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$10$$anonfun$applyOrElse$5.applyOrElse(Analyzer.scala:502)
>>   at
>>
>> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:227)
>>
>> SPARK-10671 is included.
>> For 1.5.1, I guess the absence of SPARK-10671 means that SparkSQL treats
>> percentile_approx as normal UDF.
>>
>> Experts can correct me, if there is any misunderstanding.
>>
>> Cheers
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/callUdf-percentile-approx-col-mycol-lit-0-25-does-not-compile-spark-1-5-1-source-but-it-does-work-inn-tp25111.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>> For additional commands, e-mail: user-help@spark.apache.org
>>
>>
>

Mime
View raw message