Return-Path: X-Original-To: apmail-spark-issues-archive@minotaur.apache.org Delivered-To: apmail-spark-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 4EA0118567 for ; Tue, 1 Dec 2015 15:46:11 +0000 (UTC) Received: (qmail 69036 invoked by uid 500); 1 Dec 2015 15:46:11 -0000 Delivered-To: apmail-spark-issues-archive@spark.apache.org Received: (qmail 69009 invoked by uid 500); 1 Dec 2015 15:46:11 -0000 Mailing-List: contact issues-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@spark.apache.org Received: (qmail 68982 invoked by uid 99); 1 Dec 2015 15:46:11 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 01 Dec 2015 15:46:11 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 193542C14F0 for ; Tue, 1 Dec 2015 15:46:11 +0000 (UTC) Date: Tue, 1 Dec 2015 15:46:11 +0000 (UTC) From: "Yin Huai (JIRA)" To: issues@spark.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (SPARK-11949) Query on DataFrame from cube gives wrong results MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/SPARK-11949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin Huai updated SPARK-11949: ----------------------------- Assignee: Liang-Chi Hsieh > Query on DataFrame from cube gives wrong results > ------------------------------------------------ > > Key: SPARK-11949 > URL: https://issues.apache.org/jira/browse/SPARK-11949 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 1.5.1 > Reporter: Veli Kerim Celik > Assignee: Liang-Chi Hsieh > Labels: dataframe, sql > Fix For: 1.6.0 > > > {code:title=Reproduce bug|borderStyle=solid} > case class fact(date: Int, hour: Int, minute: Int, room_name: String, temp: Double) > val df0 = sc.parallelize(Seq > ( > fact(20151123, 18, 35, "room1", 18.6), > fact(20151123, 18, 35, "room2", 22.4), > fact(20151123, 18, 36, "room1", 17.4), > fact(20151123, 18, 36, "room2", 25.6) > )).toDF() > val cube0 = df0.cube("date", "hour", "minute", "room_name").agg(Map > ( > "temp" -> "avg" > )) > cube0.where("date IS NULL").show() > {code} > The query result is empty. It should not be, because cube0 contains the value null several times in column 'date'. The issue arises because the cube function reuses the schema information from df0. If I change the type of parameters in the case class to Option[T] the query gives correct results. > Solution: The cube function should change the schema by changing the nullable property to true, for the columns (dimensions) specified in the method call parameters. > I am new at Scala and Spark. I don't know how to implement this. Somebody please do. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org For additional commands, e-mail: issues-help@spark.apache.org