From issues-return-210185-archive-asf-public=cust-asf.ponee.io@spark.apache.org  Mon Dec 10 19:48:50 2018
Return-Path: <issues-return-210185-archive-asf-public=cust-asf.ponee.io@spark.apache.org>
X-Original-To: archive-asf-public@cust-asf.ponee.io
Delivered-To: archive-asf-public@cust-asf.ponee.io
Received: from mail.apache.org (hermes.apache.org [140.211.11.3])
	by mx-eu-01.ponee.io (Postfix) with SMTP id D863F180648
	for <archive-asf-public@cust-asf.ponee.io>; Mon, 10 Dec 2018 19:48:49 +0100 (CET)
Received: (qmail 58331 invoked by uid 500); 10 Dec 2018 18:48:49 -0000
Mailing-List: contact issues-help@spark.apache.org; run by ezmlm
Precedence: bulk
List-Help: <mailto:issues-help@spark.apache.org>
List-Unsubscribe: <mailto:issues-unsubscribe@spark.apache.org>
List-Post: <mailto:issues@spark.apache.org>
List-Id: <issues.spark.apache.org>
Delivered-To: mailing list issues@spark.apache.org
Received: (qmail 58322 invoked by uid 99); 10 Dec 2018 18:48:49 -0000
Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142)
    by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 10 Dec 2018 18:48:49 +0000
Received: from localhost (localhost [127.0.0.1])
	by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 9CA59C035A
	for <issues@spark.apache.org>; Mon, 10 Dec 2018 18:48:48 +0000 (UTC)
X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org
X-Spam-Flag: NO
X-Spam-Score: -109.501
X-Spam-Level:
X-Spam-Status: No, score=-109.501 tagged_above=-999 required=6.31
	tests=[ENV_AND_HDR_SPF_MATCH=-0.5, KAM_ASCII_DIVIDERS=0.8,
	RCVD_IN_DNSWL_MED=-2.3, SPF_PASS=-0.001, USER_IN_DEF_SPF_WL=-7.5,
	USER_IN_WHITELIST=-100] autolearn=disabled
Received: from mx1-lw-us.apache.org ([10.40.0.8])
	by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024)
	with ESMTP id vgIVBp6vEs8X for <issues@spark.apache.org>;
	Mon, 10 Dec 2018 18:48:47 +0000 (UTC)
Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139])
	by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id 26B9B60F35
	for <issues@spark.apache.org>; Mon, 10 Dec 2018 18:42:01 +0000 (UTC)
Received: from jira-lw-us.apache.org (unknown [207.244.88.139])
	by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id AC514E0E87
	for <issues@spark.apache.org>; Mon, 10 Dec 2018 18:42:00 +0000 (UTC)
Received: from jira-lw-us.apache.org (localhost [127.0.0.1])
	by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 6DCF0252EE
	for <issues@spark.apache.org>; Mon, 10 Dec 2018 18:42:00 +0000 (UTC)
Date: Mon, 10 Dec 2018 18:42:00 +0000 (UTC)
From: "ASF GitHub Bot (JIRA)" <jira@apache.org>
To: issues@spark.apache.org
Message-ID: <JIRA.13166106.1528984504000.128344.1544467320448@Atlassian.JIRA>
In-Reply-To: <JIRA.13166106.1528984504000@Atlassian.JIRA>
References: <JIRA.13166106.1528984504000@Atlassian.JIRA> <JIRA.13166106.1528984504966@jira-lw-us.apache.org>
Subject: [jira] [Commented] (SPARK-24561) User-defined window functions with
 pandas udf (bounded window)
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394


    [ https://issues.apache.org/jira/browse/SPARK-24561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16715337#comment-16715337 ] 

ASF GitHub Bot commented on SPARK-24561:
----------------------------------------

icexelloss commented on a change in pull request #22305: [SPARK-24561][SQL][Python] User-defined window aggregation functions with Pandas UDF (bounded window)
URL: https://github.com/apache/spark/pull/22305#discussion_r240331939
 
 
 ##########
 File path: sql/core/src/main/scala/org/apache/spark/sql/execution/python/WindowInPandasExec.scala
 ##########
 @@ -144,24 +282,107 @@ case class WindowInPandasExec(
         queue.close()
       }
 
-      val inputProj = UnsafeProjection.create(allInputs, child.output)
-      val pythonInput = grouped.map { case (_, rows) =>
-        rows.map { row =>
-          queue.add(row.asInstanceOf[UnsafeRow])
-          inputProj(row)
+      val stream = iter.map { row =>
+        queue.add(row.asInstanceOf[UnsafeRow])
+        row
+      }
+
+      val pythonInput = new Iterator[Iterator[UnsafeRow]] {
+
+        // Manage the stream and the grouping.
+        var nextRow: UnsafeRow = null
+        var nextGroup: UnsafeRow = null
+        var nextRowAvailable: Boolean = false
+        private[this] def fetchNextRow() {
+          nextRowAvailable = stream.hasNext
+          if (nextRowAvailable) {
+            nextRow = stream.next().asInstanceOf[UnsafeRow]
+            nextGroup = grouping(nextRow)
+          } else {
+            nextRow = null
+            nextGroup = null
+          }
+        }
+        fetchNextRow()
+
+        // Manage the current partition.
+        val buffer: ExternalAppendOnlyUnsafeRowArray =
+          new ExternalAppendOnlyUnsafeRowArray(inMemoryThreshold, spillThreshold)
+        var bufferIterator: Iterator[UnsafeRow] = _
+
+        val indexRow = new SpecificInternalRow(Array.fill(numBoundIndices)(IntegerType))
+
+        val frames = factories.map(_(indexRow))
+
+        private[this] def fetchNextPartition() {
+          // Collect all the rows in the current partition.
+          // Before we start to fetch new input rows, make a copy of nextGroup.
+          val currentGroup = nextGroup.copy()
+
+          // clear last partition
+          buffer.clear()
+
+          while (nextRowAvailable && nextGroup == currentGroup) {
 
 Review comment:
   Sure. Here https://github.com/apache/spark/pull/23279

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> User-defined window functions with pandas udf (bounded window)
> --------------------------------------------------------------
>
>                 Key: SPARK-24561
>                 URL: https://issues.apache.org/jira/browse/SPARK-24561
>             Project: Spark
>          Issue Type: Sub-task
>          Components: PySpark
>    Affects Versions: 2.3.1
>            Reporter: Li Jin
>            Priority: Major
>


--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org