spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Liang-Chi Hsieh <vii...@gmail.com>
Subject Re: Skip Corrupted Parquet blocks / footer.
Date Thu, 05 Jan 2017 05:11:29 GMT

After checking the codes, I think there are few issues regarding this
ignoreCorruptFiles config, so you can't actually use it with Parquet files
now.

I opened a JIRA https://issues.apache.org/jira/browse/SPARK-19082 and also
submitted a PR for it.


khyati wrote
> Hi Reynold Xin,
> 
> In spark 2.1.0,
> I tried setting spark.sql.files.ignoreCorruptFiles = true by using
> commands,
> 
> val sqlContext =new org.apache.spark.sql.hive.HiveContext(sc)
> 
> sqlContext.setConf("spark.sql.files.ignoreCorruptFiles","true") /
> sqlContext.sql("set spark.sql.files.ignoreCorruptFiles=true")
> 
> but still getting error while reading parquet files using 
> val newDataDF =
> sqlContext.read.parquet("/data/tempparquetdata/corruptblock.0","/data/tempparquetdata/data1.parquet")
> 
> Error: ERROR executor.Executor: Exception in task 0.0 in stage 4.0 (TID 4)
> java.io.IOException: Could not read footer: java.lang.RuntimeException:
> hdfs://192.168.1.53:9000/data/tempparquetdata/corruptblock.0 is not a
> Parquet file. expected magic number at tail [80, 65, 82, 49] but found
> [65, 82, 49, 10]
> 	at
> org.apache.parquet.hadoop.ParquetFileReader.readAllFootersInParallel(ParquetFileReader.java:248)
> 
> 
> Please let me know if I am missing anything.





-----
Liang-Chi Hsieh | @viirya 
Spark Technology Center 
http://www.spark.tc/ 
--
View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Skip-Corrupted-Parquet-blocks-footer-tp20418p20466.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org


Mime
View raw message