spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "assaf.mendelson" <>
Subject implement UDF/UDAF supporting whole stage codegen
Date Wed, 07 Sep 2016 09:38:26 GMT
I want to write a UDF/UDAF which provides native processing performance. Currently, when creating
a UDF/UDAF in a normal manner the performance is hit because it breaks optimizations.
For a simple example I wanted to create a UDF which tests whether the value is smaller than
I tried something like this :

import org.apache.spark.sql.catalyst.InternalRow
import org.apache.spark.sql.catalyst.analysis.TypeCheckResult
import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, ExprCode}
import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
import org.apache.spark.sql.catalyst.util.TypeUtils
import org.apache.spark.sql.types._
import org.apache.spark.util.Utils
import org.apache.spark.sql.catalyst.expressions._

case class genf(child: Expression) extends UnaryExpression with Predicate with ImplicitCastInputTypes

  override def inputTypes: Seq[AbstractDataType] = Seq(IntegerType)

  override def toString: String = s"$child < 10"

  override def eval(input: InternalRow): Any = {
    val value = child.eval(input)
    if (value == null)
    } else {
      child.dataType match {
        case IntegerType => value.asInstanceOf[Int] < 10

  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
   defineCodeGen(ctx, ev, c => s"($c) < 10")

However, this doesn't work as some of the underlying classes/traits are private (e.g. AbstractDataType
is private) making it problematic to create a new case class.
Is there a way to do it? The idea is to provide a couple of jars with a bunch of functions
our team needs.

View this message in context:
Sent from the Apache Spark Developers List mailing list archive at
View raw message