spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yin Huai (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SPARK-11949) Query on DataFrame from cube gives wrong results
Date Tue, 01 Dec 2015 15:46:11 GMT

     [ https://issues.apache.org/jira/browse/SPARK-11949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Yin Huai updated SPARK-11949:
-----------------------------
    Assignee: Liang-Chi Hsieh

> Query on DataFrame from cube gives wrong results
> ------------------------------------------------
>
>                 Key: SPARK-11949
>                 URL: https://issues.apache.org/jira/browse/SPARK-11949
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.5.1
>            Reporter: Veli Kerim Celik
>            Assignee: Liang-Chi Hsieh
>              Labels: dataframe, sql
>             Fix For: 1.6.0
>
>
> {code:title=Reproduce bug|borderStyle=solid}
> case class fact(date: Int, hour: Int, minute: Int, room_name: String, temp: Double)
> val df0 = sc.parallelize(Seq
> (
> fact(20151123, 18, 35, "room1", 18.6),
> fact(20151123, 18, 35, "room2", 22.4),
> fact(20151123, 18, 36, "room1", 17.4),
> fact(20151123, 18, 36, "room2", 25.6)
> )).toDF()
> val cube0 = df0.cube("date", "hour", "minute", "room_name").agg(Map
> (
> "temp" -> "avg"
> ))
> cube0.where("date IS NULL").show()
> {code}
> The query result is empty. It should not be, because cube0 contains the value null several
times in column 'date'. The issue arises because the cube function reuses the schema information
from df0. If I change the type of parameters in the case class to Option[T] the query gives
correct results.
> Solution: The cube function should change the schema by changing the nullable property
to true, for the columns (dimensions) specified in the method call parameters.
> I am new at Scala and Spark. I don't know how to implement this. Somebody please do.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message