spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Marcelo Vanzin (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (SPARK-13141) Dataframe created from Hive partitioned tables using HiveContext returns wrong results
Date Wed, 02 Mar 2016 02:50:18 GMT

     [ https://issues.apache.org/jira/browse/SPARK-13141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Marcelo Vanzin resolved SPARK-13141.
------------------------------------
    Resolution: Not A Problem

Hi, this was a bug in CDH 5.5.0/5.5.1, it was fixed in CDH 5.5.2. Sorry about the trouble.

> Dataframe created from Hive partitioned tables using HiveContext returns wrong results
> --------------------------------------------------------------------------------------
>
>                 Key: SPARK-13141
>                 URL: https://issues.apache.org/jira/browse/SPARK-13141
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.5.0
>         Environment: CDH 5.5.1
>            Reporter: Simone
>            Priority: Critical
>
> I get wrong dataframe results using HiveContext with Spark 1.5.0 on CDH 5.5.1 in yarn-client
mode.
> The problem occurs with partitioned tables on text delimited HDFS data, both with Scala
and Python.
> This an example code:
> import org.apache.spark.sql.hive.HiveContext
> val hc = new HiveContext(sc)
> hc.table("my_db.partition_table").show()
> The result is that all values of all rows are NULL, except from the first column (that
contains the whole line of data) and the partitioning columns, which appears to be correct.
> With Hive and Impala I get correct results.
> Also with Spark on the same data with a not partitioned table I get correct results.
> I think that similar problems occurs also with Avro data:
> https://community.cloudera.com/t5/Advanced-Analytics-Apache-Spark/Pyspark-Table-Dataframe-returning-empty-records-from-Partitioned/td-p/35836



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message