spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Reynold Xin <>
Subject Re: In Spark-SQL, is there support for distributed execution of native Hive UDAFs?
Date Thu, 23 Apr 2015 08:35:16 GMT
Your understanding is correct -- there is no partial aggregation currently
for Hive UDAF.

However, there is a PR to fix that:

On Thu, Apr 23, 2015 at 1:30 AM, daniel.mescheder <> wrote:

> Hi everyone,
> I was playing with the integration of Hive UDAFs in Spark-SQL and noticed
> that the terminatePartial and merge methods of custom UDAFs were not
> called. This made me curious as those two methods are the ones responsible
> for distributing the UDAF execution in Hive.
> Looking at the code of HiveUdafFunction which seems to be the wrapper for
> all native Hive functions for which there exists no spark-sql specific
> implementation, I noticed that it
> a) extends AggregateFunction and not PartialAggregate
> b) only contains calls to iterate and evaluate, but never to merge of the
> underlying UDAFEvaluator object
> My question is thus twofold: Is my observation correct, that to achieve
> distributed execution of a UDAF I have to add a custom implementation at
> the spark-sql layer (like the examples in aggregates.scala)? If that is the
> case, how difficult would it be to use the terminatePartial and merge
> functions provided by the UDAFEvaluator to make Hive UDAFs distributed by
> default?
> Cheers,
> Daniel
> --
> View this message in context:
> Sent from the Apache Spark Developers List mailing list archive at

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message