Mailing-List: contact reviews-help@spark.apache.org; run by ezmlm
Precedence: bulk
From: icexelloss <git@git.apache.org>
To: reviews@spark.apache.org
Reply-To: reviews@spark.apache.org
References: <git-pr-18732-spark@git.apache.org>
In-Reply-To: <git-pr-18732-spark@git.apache.org>
Subject: [GitHub] spark pull request #18732: [SPARK-20396][SQL][PySpark] groupby().apply() wit...
Content-Type: text/plain
Message-Id: <20171004174248.02A98F566C@git1-us-west.apache.org>
Date: Wed,  4 Oct 2017 17:42:48 +0000 (UTC)
archived-at: Wed, 04 Oct 2017 17:42:49 -0000

Github user icexelloss commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18732#discussion_r142740947
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala ---
    @@ -435,6 +435,33 @@ class RelationalGroupedDataset protected[sql](
               df.logicalPlan.output,
               df.logicalPlan))
       }
    +
    +  private[sql] def flatMapGroupsInPandas(expr: PythonUDF): DataFrame = {
    +    require(expr.vectorized, "Must pass a vectorized python udf")
    +
    +    val output = expr.dataType match {
    +      case s: StructType => s.map {
    +        case StructField(name, dataType, nullable, metadata) =>
    +          AttributeReference(name, dataType, nullable, metadata)()
    +      }
    +    }
    +
    +    val groupingAttributes: Seq[Attribute] = groupingExprs.map {
    +      case ne: NamedExpression => ne.toAttribute
    +    }
    +
    +    val plan = FlatMapGroupsInPandas(
    +      groupingAttributes,
    +      expr,
    +      output,
    +      df.logicalPlan
    +    )
    +
    +    Dataset.ofRows(
    --- End diff --
    
    Fixed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org