spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yin Huai (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-7737) parquet schema discovery should not fail because of empty _temporary dir
Date Thu, 21 May 2015 18:11:17 GMT

    [ https://issues.apache.org/jira/browse/SPARK-7737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14554769#comment-14554769
] 

Yin Huai commented on SPARK-7737:
---------------------------------

Seems https://github.com/apache/spark/pull/6287 still not fix partition discovery completely.
For the case in the description, the following case works
{code}
load("/partitions5k/i=2/", "parquet")
{code}
However, for this case, we fail...
{code}
load("/partitions5k/", "parquet")
{code}

> parquet schema discovery should not fail because of empty _temporary dir 
> -------------------------------------------------------------------------
>
>                 Key: SPARK-7737
>                 URL: https://issues.apache.org/jira/browse/SPARK-7737
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.4.0
>            Reporter: Yin Huai
>            Assignee: Cheng Lian
>            Priority: Blocker
>
> Parquet schema discovery will fail when the dir is like 
> {code}
> /partitions5k/i=2/_SUCCESS
> /partitions5k/i=2/_temporary/
> /partitions5k/i=2/part-r-00001.gz.parquet
> /partitions5k/i=2/part-r-00002.gz.parquet
> /partitions5k/i=2/part-r-00003.gz.parquet
> /partitions5k/i=2/part-r-00004.gz.parquet
> {code}
> {code}
> java.lang.AssertionError: assertion failed: Conflicting partition column names detected:
> 	
> 	at scala.Predef$.assert(Predef.scala:179)
> 	at org.apache.spark.sql.sources.PartitioningUtils$.resolvePartitions(PartitioningUtils.scala:159)
> 	at org.apache.spark.sql.sources.PartitioningUtils$.parsePartitions(PartitioningUtils.scala:71)
> 	at org.apache.spark.sql.sources.HadoopFsRelation.org$apache$spark$sql$sources$HadoopFsRelation$$discoverPartitions(interfaces.scala:468)
> 	at org.apache.spark.sql.sources.HadoopFsRelation$$anonfun$partitionSpec$3.apply(interfaces.scala:424)
> 	at org.apache.spark.sql.sources.HadoopFsRelation$$anonfun$partitionSpec$3.apply(interfaces.scala:423)
> 	at scala.Option.getOrElse(Option.scala:120)
> 	at org.apache.spark.sql.sources.HadoopFsRelation.partitionSpec(interfaces.scala:422)
> 	at org.apache.spark.sql.sources.HadoopFsRelation.schema$lzycompute(interfaces.scala:482)
> 	at org.apache.spark.sql.sources.HadoopFsRelation.schema(interfaces.scala:480)
> 	at org.apache.spark.sql.sources.LogicalRelation.<init>(LogicalRelation.scala:30)
> 	at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:134)
> 	at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:118)
> 	at org.apache.spark.sql.SQLContext.load(SQLContext.scala:1135)
> {code}
> 1.3 works fine.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message