Mailing-List: contact issues-help@spark.apache.org; run by ezmlm
Precedence: bulk
Date: Fri, 30 Oct 2015 21:56:27 +0000 (UTC)
From: "Ram Kandasamy (JIRA)" <jira@apache.org>
To: issues@spark.apache.org
Message-ID: <JIRA.12909336.1446241663000.112711.1446242187793@Atlassian.JIRA>
In-Reply-To: <JIRA.12909336.1446241663000@Atlassian.JIRA>
References: <JIRA.12909336.1446241663000@Atlassian.JIRA>
 <JIRA.12909336.1446241663001@arcas>
Subject: [jira] [Closed] (SPARK-11430) DataFrame's except method does not
 work, returns 0
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


     [ https://issues.apache.org/jira/browse/SPARK-11430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ram Kandasamy closed SPARK-11430.
---------------------------------
       Resolution: Fixed
    Fix Version/s: 1.5.1

> DataFrame's except method does not work, returns 0
> --------------------------------------------------
>
>                 Key: SPARK-11430
>                 URL: https://issues.apache.org/jira/browse/SPARK-11430
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.5.0
>            Reporter: Ram Kandasamy
>             Fix For: 1.5.1
>
>
> This may or may not be related to this bug here: https://issues.apache.org/jira/browse/SPARK-11427
> But basically, the except method in dataframes should mirror the functionality of the subtract method in RDDs, but it is not doing so.
> Here is an example:
> scala> val firstFile = sqlContext.read.parquet("/Users/ramkandasamy/sparkData/2015-07-25/*").select("id").distinct
> firstFile: org.apache.spark.sql.DataFrame = [id: string]
> scala> val secondFile = sqlContext.read.parquet("/Users/ramkandasamy/sparkData/2015-10-23/*").select("id").distinct
> secondFile: org.apache.spark.sql.DataFrame = [id: string]
> scala> firstFile.count
> res1: Long = 1072046
> scala> secondFile.count
> res2: Long = 3569941
> scala> firstFile.except(secondFile).count
> res3: Long = 0
> scala> firstFile.rdd.subtract(secondFile.rdd).count
> res4: Long = 1072046
> Can anyone help out here? Thanks!


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org