spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kevin Jung (JIRA)" <>
Subject [jira] [Created] (SPARK-5737) Scanning duplicate columns from parquet table
Date Wed, 11 Feb 2015 08:26:11 GMT
Kevin Jung created SPARK-5737:

             Summary: Scanning duplicate columns from parquet table
                 Key: SPARK-5737
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 1.2.1
            Reporter: Kevin Jung

import org.apache.spark.sql._
val sqlContext = new SQLContext(sc)
import sqlContext._
val rdd = sqlContext.parquetFile("temp.parquet")'d1,'d1,'d2,'d2).take(3).foreach(println)

The results of above code have null values at the preceding columns of duplicate two.
For example,


This happens only in ParquetTableScan. PysicalRDD works fine and the rows have duplicate values


This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message