parquet-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From b...@apache.org
Subject parquet-mr git commit: PARQUET-151: Skip writing _metadata file in case of no footers since schema cannot be determined.
Date Mon, 01 Jun 2015 21:22:22 GMT
Repository: parquet-mr
Updated Branches:
  refs/heads/master 33a220260 -> 4b5cda5a2


PARQUET-151: Skip writing _metadata file in case of no footers since schema cannot be determined.

This fixes npe seen during mergeFooters in such a case.
 For this scenario onus of writing any summary files lies with the caller (It might have some
global schema available) So for example spark does it when persisting empty RDD.

Author: Yash Datta <Yash.Datta@guavus.com>

Closes #205 from saucam/footer_bug and squashes the following commits:

b2b3ddf [Yash Datta] PARQUET-151: Skip writing _metadata file in case of no footers since
schema cannot be determined. This fixes npe seen during mergeFooters in such a case.     
        For this scenario onus of writing any summary files lies with the caller (It might
have some global schema available)


Project: http://git-wip-us.apache.org/repos/asf/parquet-mr/repo
Commit: http://git-wip-us.apache.org/repos/asf/parquet-mr/commit/4b5cda5a
Tree: http://git-wip-us.apache.org/repos/asf/parquet-mr/tree/4b5cda5a
Diff: http://git-wip-us.apache.org/repos/asf/parquet-mr/diff/4b5cda5a

Branch: refs/heads/master
Commit: 4b5cda5a2c6ca613db5129d50ffffce2604ad9eb
Parents: 33a2202
Author: Yash Datta <Yash.Datta@guavus.com>
Authored: Mon Jun 1 14:21:53 2015 -0700
Committer: Ryan Blue <blue@apache.org>
Committed: Mon Jun 1 14:21:53 2015 -0700

----------------------------------------------------------------------
 .../java/org/apache/parquet/hadoop/ParquetOutputCommitter.java  | 5 +++++
 1 file changed, 5 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/parquet-mr/blob/4b5cda5a/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetOutputCommitter.java
----------------------------------------------------------------------
diff --git a/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetOutputCommitter.java
b/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetOutputCommitter.java
index a1589c0..9a0930a 100644
--- a/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetOutputCommitter.java
+++ b/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetOutputCommitter.java
@@ -54,6 +54,11 @@ public class ParquetOutputCommitter extends FileOutputCommitter {
         final FileSystem fileSystem = outputPath.getFileSystem(configuration);
         FileStatus outputStatus = fileSystem.getFileStatus(outputPath);
         List<Footer> footers = ParquetFileReader.readAllFootersInParallel(configuration,
outputStatus);
+        // If there are no footers, _metadata file cannot be written since there is no way
to determine schema!
+        // Onus of writing any summary files lies with the caller in this case.
+        if (footers.isEmpty()) {
+          return;
+        }
         try {
           ParquetFileWriter.writeMetadataFile(configuration, outputPath, footers);
         } catch (Exception e) {


Mime
View raw message