spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kevin Jung (JIRA)" <j...@apache.org>
Subject [jira] [Created] (SPARK-5737) Scanning duplicate columns from parquet table
Date Wed, 11 Feb 2015 08:26:11 GMT
Kevin Jung created SPARK-5737:
---------------------------------

             Summary: Scanning duplicate columns from parquet table
                 Key: SPARK-5737
                 URL: https://issues.apache.org/jira/browse/SPARK-5737
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 1.2.1
            Reporter: Kevin Jung


{quote}
import org.apache.spark.sql._
val sqlContext = new SQLContext(sc)
import sqlContext._
val rdd = sqlContext.parquetFile("temp.parquet")
rdd.select('d1,'d1,'d2,'d2).take(3).foreach(println)
{quote}

The results of above code have null values at the preceding columns of duplicate two.
For example,

{quote}
[null,-5.7,null,121.05]
[null,-61.17,null,108.91]
[null,50.60,null,72.15]
{quote}

This happens only in ParquetTableScan. PysicalRDD works fine and the rows have duplicate values
like...

{quote}
[-5.7,-5.7,121.05,121.05]
[-61.17,-61.17,108.91,108.91]
[50.60,50.60,72.15,72.15]
{quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message