spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Munesh Bandaru (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-15918) unionAll returns wrong result when two dataframes has schema in different order
Date Fri, 01 Sep 2017 21:33:00 GMT

    [ https://issues.apache.org/jira/browse/SPARK-15918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16151185#comment-16151185
] 

Munesh Bandaru commented on SPARK-15918:
----------------------------------------

As the ticket was closed as 'Not a Problem', a workaround is to use the 'select' to change
the order of the columns of one of the dataframe.
But if we have a large number of columns, it doesn't look good to specify all the columns.
So we can use the columns of one dataframe to arrange the other dataframe in the order as
below.
{{comb_df = df1.unionAll(df2.select(df1.columns))}}

> unionAll returns wrong result when two dataframes has schema in different order
> -------------------------------------------------------------------------------
>
>                 Key: SPARK-15918
>                 URL: https://issues.apache.org/jira/browse/SPARK-15918
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.6.1
>         Environment: CentOS
>            Reporter: Prabhu Joseph
>
> On applying unionAll operation between A and B dataframes, they both has same schema
but in different order and hence the result has column value mapping changed.
> Repro:
> {code}
> A.show()
> +---+--------+-------+------+------+-----+----+-------+------+-------+-------+-----+
> |tag|year_day|tm_hour|tm_min|tm_sec|dtype|time|tm_mday|tm_mon|tm_yday|tm_year|value|
> +---+--------+-------+------+------+-----+----+-------+------+-------+-------+-----+
> +---+--------+-------+------+------+-----+----+-------+------+-------+-------+-----+
> B.show()
> +-----+-------------------+----------+-------+-------+------+------+------+-------+-------+------+--------+
> |dtype|                tag|      time|tm_hour|tm_mday|tm_min|tm_mon|tm_sec|tm_yday|tm_year|
value|year_day|
> +-----+-------------------+----------+-------+-------+------+------+------+-------+-------+------+--------+
> |    F|C_FNHXUT701Z.CNSTLO|1443790800|     13|      2|     0|    10|     0|    275| 
 2015|1.2345| 2015275|
> |    F|C_FNHXUDP713.CNSTHI|1443790800|     13|      2|     0|    10|     0|    275| 
 2015|1.2345| 2015275|
> |    F| C_FNHXUT718.CNSTHI|1443790800|     13|      2|     0|    10|     0|    275| 
 2015|1.2345| 2015275|
> |    F|C_FNHXUT703Z.CNSTLO|1443790800|     13|      2|     0|    10|     0|    275| 
 2015|1.2345| 2015275|
> |    F|C_FNHXUR716A.CNSTLO|1443790800|     13|      2|     0|    10|     0|    275| 
 2015|1.2345| 2015275|
> |    F|C_FNHXUT803Z.CNSTHI|1443790800|     13|      2|     0|    10|     0|    275| 
 2015|1.2345| 2015275|
> |    F| C_FNHXUT728.CNSTHI|1443790800|     13|      2|     0|    10|     0|    275| 
 2015|1.2345| 2015275|
> |    F| C_FNHXUR806.CNSTHI|1443790800|     13|      2|     0|    10|     0|    275| 
 2015|1.2345| 2015275|
> +-----+-------------------+----------+-------+-------+------+------+------+-------+-------+------+--------+
> A = A.unionAll(B)
> A.show()
> +---+-------------------+----------+------+------+-----+----+-------+------+-------+-------+---------+
> |tag|           year_day|   tm_hour|tm_min|tm_sec|dtype|time|tm_mday|tm_mon|tm_yday|tm_year|
   value|
> +---+-------------------+----------+------+------+-----+----+-------+------+-------+-------+---------+
> |  F|C_FNHXUT701Z.CNSTLO|1443790800|    13|     2|    0|  10|      0|   275|   2015|
1.2345|2015275.0|
> |  F|C_FNHXUDP713.CNSTHI|1443790800|    13|     2|    0|  10|      0|   275|   2015|
1.2345|2015275.0|
> |  F| C_FNHXUT718.CNSTHI|1443790800|    13|     2|    0|  10|      0|   275|   2015|
1.2345|2015275.0|
> |  F|C_FNHXUT703Z.CNSTLO|1443790800|    13|     2|    0|  10|      0|   275|   2015|
1.2345|2015275.0|
> |  F|C_FNHXUR716A.CNSTLO|1443790800|    13|     2|    0|  10|      0|   275|   2015|
1.2345|2015275.0|
> |  F|C_FNHXUT803Z.CNSTHI|1443790800|    13|     2|    0|  10|      0|   275|   2015|
1.2345|2015275.0|
> |  F| C_FNHXUT728.CNSTHI|1443790800|    13|     2|    0|  10|      0|   275|   2015|
1.2345|2015275.0|
> |  F| C_FNHXUR806.CNSTHI|1443790800|    13|     2|    0|  10|      0|   275|   2015|
1.2345|2015275.0|
> +---+-------------------+----------+------+------+-----+----+-------+------+-------+-------+---------+
> {code}
> On changing the schema of A according to B and doing unionAll works fine
> {code}
> C = A.select("dtype","tag","time","tm_hour","tm_mday","tm_min",”tm_mon”,"tm_sec","tm_yday","tm_year","value","year_day")
> A = C.unionAll(B)
> A.show()
> +-----+-------------------+----------+-------+-------+------+------+------+-------+-------+------+--------+
> |dtype|                tag|      time|tm_hour|tm_mday|tm_min|tm_mon|tm_sec|tm_yday|tm_year|
value|year_day|
> +-----+-------------------+----------+-------+-------+------+------+------+-------+-------+------+--------+
> |    F|C_FNHXUT701Z.CNSTLO|1443790800|     13|      2|     0|    10|     0|    275| 
 2015|1.2345| 2015275|
> |    F|C_FNHXUDP713.CNSTHI|1443790800|     13|      2|     0|    10|     0|    275| 
 2015|1.2345| 2015275|
> |    F| C_FNHXUT718.CNSTHI|1443790800|     13|      2|     0|    10|     0|    275| 
 2015|1.2345| 2015275|
> |    F|C_FNHXUT703Z.CNSTLO|1443790800|     13|      2|     0|    10|     0|    275| 
 2015|1.2345| 2015275|
> |    F|C_FNHXUR716A.CNSTLO|1443790800|     13|      2|     0|    10|     0|    275| 
 2015|1.2345| 2015275|
> |    F|C_FNHXUT803Z.CNSTHI|1443790800|     13|      2|     0|    10|     0|    275| 
 2015|1.2345| 2015275|
> |    F| C_FNHXUT728.CNSTHI|1443790800|     13|      2|     0|    10|     0|    275| 
 2015|1.2345| 2015275|
> |    F| C_FNHXUR806.CNSTHI|1443790800|     13|      2|     0|    10|     0|    275| 
 2015|1.2345| 2015275|
> +-----+-------------------+----------+-------+-------+------+------+------+-------+-------+------+--------+
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message