Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id B4B9C200D0A for ; Wed, 4 Oct 2017 19:42:49 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id B2E681609DD; Wed, 4 Oct 2017 17:42:49 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 2B0AF1609D6 for ; Wed, 4 Oct 2017 19:42:49 +0200 (CEST) Received: (qmail 92127 invoked by uid 500); 4 Oct 2017 17:42:48 -0000 Mailing-List: contact reviews-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list reviews@spark.apache.org Received: (qmail 92111 invoked by uid 99); 4 Oct 2017 17:42:48 -0000 Received: from git1-us-west.apache.org (HELO git1-us-west.apache.org) (140.211.11.23) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 04 Oct 2017 17:42:48 +0000 Received: by git1-us-west.apache.org (ASF Mail Server at git1-us-west.apache.org, from userid 33) id 02A98F566C; Wed, 4 Oct 2017 17:42:48 +0000 (UTC) From: icexelloss To: reviews@spark.apache.org Reply-To: reviews@spark.apache.org References: In-Reply-To: Subject: [GitHub] spark pull request #18732: [SPARK-20396][SQL][PySpark] groupby().apply() wit... Content-Type: text/plain Message-Id: <20171004174248.02A98F566C@git1-us-west.apache.org> Date: Wed, 4 Oct 2017 17:42:48 +0000 (UTC) archived-at: Wed, 04 Oct 2017 17:42:49 -0000 Github user icexelloss commented on a diff in the pull request: https://github.com/apache/spark/pull/18732#discussion_r142740947 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala --- @@ -435,6 +435,33 @@ class RelationalGroupedDataset protected[sql]( df.logicalPlan.output, df.logicalPlan)) } + + private[sql] def flatMapGroupsInPandas(expr: PythonUDF): DataFrame = { + require(expr.vectorized, "Must pass a vectorized python udf") + + val output = expr.dataType match { + case s: StructType => s.map { + case StructField(name, dataType, nullable, metadata) => + AttributeReference(name, dataType, nullable, metadata)() + } + } + + val groupingAttributes: Seq[Attribute] = groupingExprs.map { + case ne: NamedExpression => ne.toAttribute + } + + val plan = FlatMapGroupsInPandas( + groupingAttributes, + expr, + output, + df.logicalPlan + ) + + Dataset.ofRows( --- End diff -- Fixed. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org For additional commands, e-mail: reviews-help@spark.apache.org