Mailing-List: contact issues-help@spark.apache.org; run by ezmlm
Precedence: bulk
Date: Wed, 7 Dec 2016 17:31:59 +0000 (UTC)
From: "Herman van Hovell (JIRA)" <jira@apache.org>
To: issues@spark.apache.org
Message-ID: <JIRA.13009129.1475426102000.462686.1481131919073@Atlassian.JIRA>
In-Reply-To: <JIRA.13009129.1475426102000@Atlassian.JIRA>
References: <JIRA.13009129.1475426102000@Atlassian.JIRA> <JIRA.13009129.1475426102129@arcas>
Subject: [jira] [Updated] (SPARK-17760) DataFrame's pivot doesn't see column
 created in groupBy
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
archived-at: Wed, 07 Dec 2016 17:32:00 -0000


     [ https://issues.apache.org/jira/browse/SPARK-17760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Herman van Hovell updated SPARK-17760:
--------------------------------------
    Fix Version/s: 2.0.3

> DataFrame's pivot doesn't see column created in groupBy
> -------------------------------------------------------
>
>                 Key: SPARK-17760
>                 URL: https://issues.apache.org/jira/browse/SPARK-17760
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 2.0.0
>         Environment: Databrick's community version, spark 2.0.0, pyspark, python 2.
>            Reporter: Alberto Bonsanto
>            Assignee: Andrew Ray
>              Labels: easytest, newbie
>             Fix For: 2.0.3, 2.1.0
>
>
> Related to [https://stackoverflow.com/questions/39817993/pivoting-with-missing-values]. I'm not completely sure if this is a bug or expected behavior.
> When you `groypBy` by a column generated inside of it, the `pivot` method apparently doesn't find this column during the analysis.
> E.g.
> {code:none}
> df = (sc.parallelize([(1.0, "2016-03-30 01:00:00"), 
>                       (30.2, "2015-01-02 03:00:02")])
>         .toDF(["amount", "Date"])
>         .withColumn("Date", col("Date").cast("timestamp")))
> (df.withColumn("hour",hour("date"))
>    .groupBy(dayofyear("date").alias("date"))
>    .pivot("hour").sum("amount").show()){code}
> Shows the following exception.
> {quote}
> AnalysisException: u'resolved attribute(s) date#140688 missing from dayofyear(date)#140994,hour#140977,sum(`amount`)#140995 in operator !Aggregate \[dayofyear(cast(date#140688 as date))], [dayofyear(cast(date#140688 as date)) AS dayofyear(date)#140994, pivotfirst(hour#140977, sum(`amount`)#140995, 1, 3, 0, 0) AS __pivot_sum(`amount`) AS `sum(``amount``)`#141001\];'
> {quote}
> To solve it you have to add the column {{date}} before grouping and pivoting.


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org