Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 57E16200C06 for ; Fri, 13 Jan 2017 02:14:41 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 5687B160B4C; Fri, 13 Jan 2017 01:14:41 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 9EE99160B40 for ; Fri, 13 Jan 2017 02:14:40 +0100 (CET) Received: (qmail 68184 invoked by uid 500); 13 Jan 2017 01:14:39 -0000 Mailing-List: contact reviews-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list reviews@spark.apache.org Received: (qmail 68168 invoked by uid 99); 13 Jan 2017 01:14:39 -0000 Received: from git1-us-west.apache.org (HELO git1-us-west.apache.org) (140.211.11.23) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 13 Jan 2017 01:14:39 +0000 Received: by git1-us-west.apache.org (ASF Mail Server at git1-us-west.apache.org, from userid 33) id 6871CDFA22; Fri, 13 Jan 2017 01:14:39 +0000 (UTC) From: michalsenkyr To: reviews@spark.apache.org Reply-To: reviews@spark.apache.org References: In-Reply-To: Subject: [GitHub] spark pull request #16541: [SPARK-19088][SQL] Optimize sequence type deseria... Content-Type: text/plain Message-Id: <20170113011439.6871CDFA22@git1-us-west.apache.org> Date: Fri, 13 Jan 2017 01:14:39 +0000 (UTC) archived-at: Fri, 13 Jan 2017 01:14:41 -0000 Github user michalsenkyr commented on a diff in the pull request: https://github.com/apache/spark/pull/16541#discussion_r95921320 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala --- @@ -589,6 +590,171 @@ case class MapObjects private( } } +object CollectObjects { + private val curId = new java.util.concurrent.atomic.AtomicInteger() + + /** + * Construct an instance of CollectObjects case class. + * + * @param function The function applied on the collection elements. + * @param inputData An expression that when evaluated returns a collection object. + * @param elementType The data type of elements in the collection. + * @param collClass The type of the resulting collection. + */ + def apply( + function: Expression => Expression, + inputData: Expression, + elementType: DataType, + collClass: Class[_]): CollectObjects = { + val loopValue = "CollectObjects_loopValue" + curId.getAndIncrement() + val loopIsNull = "CollectObjects_loopIsNull" + curId.getAndIncrement() + val loopVar = LambdaVariable(loopValue, loopIsNull, elementType) + val builderValue = "CollectObjects_builderValue" + curId.getAndIncrement() + CollectObjects(loopValue, loopIsNull, elementType, function(loopVar), inputData, + collClass, builderValue) + } +} + +/** + * An equivalent to the [[MapObjects]] case class but returning an ObjectType containing + * a Scala collection constructed using the associated builder, obtained by calling `newBuilder` + * on the collection's companion object. + * + * @param loopValue the name of the loop variable that used when iterate the collection, and used + * as input for the `lambdaFunction` + * @param loopIsNull the nullity of the loop variable that used when iterate the collection, and + * used as input for the `lambdaFunction` + * @param loopVarDataType the data type of the loop variable that used when iterate the collection, + * and used as input for the `lambdaFunction` + * @param lambdaFunction A function that take the `loopVar` as input, and used as lambda function + * to handle collection elements. + * @param inputData An expression that when evaluated returns a collection object. + * @param collClass The type of the resulting collection. + * @param builderValue The name of the builder variable used to construct the resulting collection. + */ +case class CollectObjects private( + loopValue: String, + loopIsNull: String, + loopVarDataType: DataType, + lambdaFunction: Expression, + inputData: Expression, + collClass: Class[_], + builderValue: String) extends Expression with NonSQLExpression { + + override def nullable: Boolean = inputData.nullable + + override def children: Seq[Expression] = lambdaFunction :: inputData :: Nil + + override def eval(input: InternalRow): Any = + throw new UnsupportedOperationException("Only code-generated evaluation is supported") + + override def dataType: DataType = ObjectType(collClass) + + override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { + val collObjectName = s"${collClass.getName}$$.MODULE$$" + val getBuilderVar = s"$collObjectName.newBuilder()" --- End diff -- I added the `Seq` builder fallback. However, there is presently no collection that Spark supports that doesn't provide a builder. You can try it out on your branch with `Range`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastructure@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org For additional commands, e-mail: reviews-help@spark.apache.org