parquet-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From alexleven...@apache.org
Subject incubator-parquet-mr git commit: PARQUET-136: NPE thrown in StatisticsFilter when all values in a string/binary column trunk are null
Date Tue, 27 Jan 2015 02:22:31 GMT
Repository: incubator-parquet-mr
Updated Branches:
  refs/heads/master d70fdbc40 -> 4bf9be34a


PARQUET-136: NPE thrown in StatisticsFilter when all values in a string/binary column trunk
are null

In case of all nulls in a binary column, statistics object read from file metadata is empty,
and should return true for all nulls check for the column. Even if column has no values, it
can be ignored.

The other way is to fix this behaviour in the writer, but is that what we want ?

Author: Yash Datta <Yash.Datta@guavus.com>
Author: Alex Levenson <alexlevenson@twitter.com>
Author: Yash Datta <saucam@gmail.com>

Closes #99 from saucam/npe and squashes the following commits:

5138e44 [Yash Datta] PARQUET-136: Remove unreachable block
b17cd38 [Yash Datta] Revert "PARQUET-161: Trigger tests"
82209e6 [Yash Datta] PARQUET-161: Trigger tests
aab2f81 [Yash Datta] PARQUET-161: Review comments for the test case
2217ee2 [Yash Datta] PARQUET-161: Add a test case for checking the correct statistics info
is recorded in case of all nulls in a column
c2f8d6f [Yash Datta] PARQUET-161: Fix the write path to write statistics object in case of
only nulls in the column
97bb517 [Yash Datta] Revert "revert TestStatisticsFilter.java"
a06f0d0 [Yash Datta] Merge pull request #1 from isnotinvain/alexlevenson/PARQUET-161-136
b1001eb [Alex Levenson] Fix statistics isEmpty, handle more edge cases in statistics filter
0c88be0 [Alex Levenson] revert TestStatisticsFilter.java
1ac9192 [Yash Datta] PARQUET-136: Its better to not filter chunks for which empty statistics
object is returned. Empty statistics can be read in case of 1. pre-statistics files, 2. files
written from current writer that has a bug, as it does not write the statistics if column
has all nulls
e5e924e [Yash Datta] Revert "PARQUET-136: In case of all nulls in a binary column, statistics
object read from file metadata is empty, and should return true for all nulls check for the
column"
8cc5106 [Yash Datta] Revert "PARQUET-136: fix hasNulls to cater to the case where all values
are nulls"
c7c126f [Yash Datta] PARQUET-136: fix hasNulls to cater to the case where all values are nulls
974a22b [Yash Datta] PARQUET-136: In case of all nulls in a binary column, statistics object
read from file metadata is empty, and should return true for all nulls check for the column


Project: http://git-wip-us.apache.org/repos/asf/incubator-parquet-mr/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-parquet-mr/commit/4bf9be34
Tree: http://git-wip-us.apache.org/repos/asf/incubator-parquet-mr/tree/4bf9be34
Diff: http://git-wip-us.apache.org/repos/asf/incubator-parquet-mr/diff/4bf9be34

Branch: refs/heads/master
Commit: 4bf9be34a87b51d07e0b0c9e74831bbcdbce0f74
Parents: d70fdbc
Author: Yash Datta <Yash.Datta@guavus.com>
Authored: Mon Jan 26 18:21:11 2015 -0800
Committer: Alex Levenson <alexlevenson@twitter.com>
Committed: Mon Jan 26 18:21:11 2015 -0800

----------------------------------------------------------------------
 .../column/statistics/BinaryStatistics.java     | 12 ++--
 .../column/statistics/BooleanStatistics.java    | 10 ++--
 .../column/statistics/DoubleStatistics.java     | 10 ++--
 .../column/statistics/FloatStatistics.java      |  8 ++-
 .../column/statistics/IntStatistics.java        |  8 ++-
 .../column/statistics/LongStatistics.java       | 10 ++--
 .../parquet/column/statistics/Statistics.java   | 24 ++++++--
 .../statisticslevel/StatisticsFilter.java       | 63 ++++++++++++++++----
 .../converter/ParquetMetadataConverter.java     | 10 +++-
 .../statisticslevel/TestStatisticsFilter.java   |  8 +--
 .../parquet/hadoop/TestParquetFileWriter.java   | 36 +++++++++++
 11 files changed, 152 insertions(+), 47 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-parquet-mr/blob/4bf9be34/parquet-column/src/main/java/parquet/column/statistics/BinaryStatistics.java
----------------------------------------------------------------------
diff --git a/parquet-column/src/main/java/parquet/column/statistics/BinaryStatistics.java
b/parquet-column/src/main/java/parquet/column/statistics/BinaryStatistics.java
index f125b2f..6ef1678 100644
--- a/parquet-column/src/main/java/parquet/column/statistics/BinaryStatistics.java
+++ b/parquet-column/src/main/java/parquet/column/statistics/BinaryStatistics.java
@@ -24,7 +24,7 @@ public class BinaryStatistics extends Statistics<Binary> {
 
   @Override
   public void updateStats(Binary value) {
-    if (this.isEmpty()) {
+    if (!this.hasNonNullValue()) {
       initializeStats(value, value);
     } else {
       updateStats(value, value);
@@ -34,7 +34,7 @@ public class BinaryStatistics extends Statistics<Binary> {
   @Override
   public void mergeStatisticsMinMax(Statistics stats) {
     BinaryStatistics binaryStats = (BinaryStatistics)stats;
-    if (this.isEmpty()) {
+    if (!this.hasNonNullValue()) {
       initializeStats(binaryStats.getMin(), binaryStats.getMax());
     } else {
       updateStats(binaryStats.getMin(), binaryStats.getMax());
@@ -60,9 +60,11 @@ public class BinaryStatistics extends Statistics<Binary> {
 
   @Override
   public String toString() {
-    if(!this.isEmpty())
+    if (this.hasNonNullValue())
       return String.format("min: %s, max: %s, num_nulls: %d", min.toStringUsingUTF8(), max.toStringUsingUTF8(),
this.getNumNulls());
-    else
+   else if (!this.isEmpty())
+      return String.format("num_nulls: %d, min/max not defined", this.getNumNulls());
+   else
       return "no stats for this column";
   }
 
@@ -100,4 +102,4 @@ public class BinaryStatistics extends Statistics<Binary> {
     this.min = min;
     this.markAsNotEmpty();
   }
-}
\ No newline at end of file
+}

http://git-wip-us.apache.org/repos/asf/incubator-parquet-mr/blob/4bf9be34/parquet-column/src/main/java/parquet/column/statistics/BooleanStatistics.java
----------------------------------------------------------------------
diff --git a/parquet-column/src/main/java/parquet/column/statistics/BooleanStatistics.java
b/parquet-column/src/main/java/parquet/column/statistics/BooleanStatistics.java
index 6741343..8e5b233 100644
--- a/parquet-column/src/main/java/parquet/column/statistics/BooleanStatistics.java
+++ b/parquet-column/src/main/java/parquet/column/statistics/BooleanStatistics.java
@@ -24,7 +24,7 @@ public class BooleanStatistics extends Statistics<Boolean> {
 
   @Override
   public void updateStats(boolean value) {
-    if (this.isEmpty()) {
+    if (!this.hasNonNullValue()) {
       initializeStats(value, value);
     } else {
       updateStats(value, value);
@@ -34,7 +34,7 @@ public class BooleanStatistics extends Statistics<Boolean> {
   @Override
   public void mergeStatisticsMinMax(Statistics stats) {
     BooleanStatistics boolStats = (BooleanStatistics)stats;
-    if (this.isEmpty()) {
+    if (!this.hasNonNullValue()) {
       initializeStats(boolStats.getMin(), boolStats.getMax());
     } else {
       updateStats(boolStats.getMin(), boolStats.getMax());
@@ -60,9 +60,11 @@ public class BooleanStatistics extends Statistics<Boolean> {
 
   @Override
   public String toString() {
-    if(!this.isEmpty())
+    if (this.hasNonNullValue())
       return String.format("min: %b, max: %b, num_nulls: %d", min, max, this.getNumNulls());
-    else
+    else if(!this.isEmpty())
+      return String.format("num_nulls: %d, min/max not defined", this.getNumNulls());
+    else  
       return "no stats for this column";
   }
 

http://git-wip-us.apache.org/repos/asf/incubator-parquet-mr/blob/4bf9be34/parquet-column/src/main/java/parquet/column/statistics/DoubleStatistics.java
----------------------------------------------------------------------
diff --git a/parquet-column/src/main/java/parquet/column/statistics/DoubleStatistics.java
b/parquet-column/src/main/java/parquet/column/statistics/DoubleStatistics.java
index c9695f3..ccbf700 100644
--- a/parquet-column/src/main/java/parquet/column/statistics/DoubleStatistics.java
+++ b/parquet-column/src/main/java/parquet/column/statistics/DoubleStatistics.java
@@ -24,7 +24,7 @@ public class DoubleStatistics extends Statistics<Double> {
 
   @Override
   public void updateStats(double value) {
-    if (this.isEmpty()) {
+    if (!this.hasNonNullValue()) {
       initializeStats(value, value);
     } else {
       updateStats(value, value);
@@ -34,7 +34,7 @@ public class DoubleStatistics extends Statistics<Double> {
   @Override
   public void mergeStatisticsMinMax(Statistics stats) {
     DoubleStatistics doubleStats = (DoubleStatistics)stats;
-    if (this.isEmpty()) {
+    if (!this.hasNonNullValue()) {
       initializeStats(doubleStats.getMin(), doubleStats.getMax());
     } else {
       updateStats(doubleStats.getMin(), doubleStats.getMax());
@@ -60,8 +60,10 @@ public class DoubleStatistics extends Statistics<Double> {
 
   @Override
   public String toString() {
-    if(!this.isEmpty())
+    if(this.hasNonNullValue())
       return String.format("min: %.5f, max: %.5f, num_nulls: %d", min, max, this.getNumNulls());
+    else if (!this.isEmpty())
+      return String.format("num_nulls: %d, min/max not defined", this.getNumNulls());
     else
       return "no stats for this column";
   }
@@ -100,4 +102,4 @@ public class DoubleStatistics extends Statistics<Double> {
     this.min = min;
     this.markAsNotEmpty();
   }
-}
\ No newline at end of file
+}

http://git-wip-us.apache.org/repos/asf/incubator-parquet-mr/blob/4bf9be34/parquet-column/src/main/java/parquet/column/statistics/FloatStatistics.java
----------------------------------------------------------------------
diff --git a/parquet-column/src/main/java/parquet/column/statistics/FloatStatistics.java b/parquet-column/src/main/java/parquet/column/statistics/FloatStatistics.java
index b13aafa..54a550d 100644
--- a/parquet-column/src/main/java/parquet/column/statistics/FloatStatistics.java
+++ b/parquet-column/src/main/java/parquet/column/statistics/FloatStatistics.java
@@ -24,7 +24,7 @@ public class FloatStatistics extends Statistics<Float> {
 
   @Override
   public void updateStats(float value) {
-    if (this.isEmpty()) {
+    if (!this.hasNonNullValue()) {
       initializeStats(value, value);
     } else {
       updateStats(value, value);
@@ -34,7 +34,7 @@ public class FloatStatistics extends Statistics<Float> {
   @Override
   public void mergeStatisticsMinMax(Statistics stats) {
     FloatStatistics floatStats = (FloatStatistics)stats;
-    if (this.isEmpty()) {
+    if (!this.hasNonNullValue()) {
       initializeStats(floatStats.getMin(), floatStats.getMax());
     } else {
       updateStats(floatStats.getMin(), floatStats.getMax());
@@ -60,8 +60,10 @@ public class FloatStatistics extends Statistics<Float> {
 
   @Override
   public String toString() {
-    if(!this.isEmpty())
+    if (this.hasNonNullValue())
       return String.format("min: %.5f, max: %.5f, num_nulls: %d", min, max, this.getNumNulls());
+    else if (!this.isEmpty())
+      return String.format("num_nulls: %d, min/max not defined", this.getNumNulls());
     else
       return "no stats for this column";
   }

http://git-wip-us.apache.org/repos/asf/incubator-parquet-mr/blob/4bf9be34/parquet-column/src/main/java/parquet/column/statistics/IntStatistics.java
----------------------------------------------------------------------
diff --git a/parquet-column/src/main/java/parquet/column/statistics/IntStatistics.java b/parquet-column/src/main/java/parquet/column/statistics/IntStatistics.java
index 7bdd6be..0651472 100644
--- a/parquet-column/src/main/java/parquet/column/statistics/IntStatistics.java
+++ b/parquet-column/src/main/java/parquet/column/statistics/IntStatistics.java
@@ -24,7 +24,7 @@ public class IntStatistics extends Statistics<Integer> {
 
   @Override
   public void updateStats(int value) {
-    if (this.isEmpty()) {
+    if (!this.hasNonNullValue()) {
       initializeStats(value, value);
     } else {
       updateStats(value, value);
@@ -34,7 +34,7 @@ public class IntStatistics extends Statistics<Integer> {
   @Override
   public void mergeStatisticsMinMax(Statistics stats) {
     IntStatistics intStats = (IntStatistics)stats;
-    if (this.isEmpty()) {
+    if (!this.hasNonNullValue()) {
       initializeStats(intStats.getMin(), intStats.getMax());
     } else {
       updateStats(intStats.getMin(), intStats.getMax());
@@ -60,8 +60,10 @@ public class IntStatistics extends Statistics<Integer> {
 
   @Override
   public String toString() {
-    if(!this.isEmpty())
+    if (this.hasNonNullValue())
       return String.format("min: %d, max: %d, num_nulls: %d", min, max, this.getNumNulls());
+    else if (!this.isEmpty())
+      return String.format("num_nulls: %d, min/max is not defined", this.getNumNulls());
     else
       return "no stats for this column";
   }

http://git-wip-us.apache.org/repos/asf/incubator-parquet-mr/blob/4bf9be34/parquet-column/src/main/java/parquet/column/statistics/LongStatistics.java
----------------------------------------------------------------------
diff --git a/parquet-column/src/main/java/parquet/column/statistics/LongStatistics.java b/parquet-column/src/main/java/parquet/column/statistics/LongStatistics.java
index bae63a9..7040747 100644
--- a/parquet-column/src/main/java/parquet/column/statistics/LongStatistics.java
+++ b/parquet-column/src/main/java/parquet/column/statistics/LongStatistics.java
@@ -24,7 +24,7 @@ public class LongStatistics extends Statistics<Long> {
 
   @Override
   public void updateStats(long value) {
-    if (this.isEmpty()) {
+    if (!this.hasNonNullValue()) {
       initializeStats(value, value);
     } else {
       updateStats(value, value);
@@ -34,7 +34,7 @@ public class LongStatistics extends Statistics<Long> {
   @Override
   public void mergeStatisticsMinMax(Statistics stats) {
     LongStatistics longStats = (LongStatistics)stats;
-    if (this.isEmpty()) {
+    if (!this.hasNonNullValue()) {
       initializeStats(longStats.getMin(), longStats.getMax());
     } else {
       updateStats(longStats.getMin(), longStats.getMax());
@@ -60,8 +60,10 @@ public class LongStatistics extends Statistics<Long> {
 
   @Override
   public String toString() {
-    if(!this.isEmpty())
+    if (this.hasNonNullValue())
       return String.format("min: %d, max: %d, num_nulls: %d", min, max, this.getNumNulls());
+    else if (!this.isEmpty())
+      return String.format("num_nulls: %d, min/max not defined", this.getNumNulls());
     else
       return "no stats for this column";
   }
@@ -100,4 +102,4 @@ public class LongStatistics extends Statistics<Long> {
     this.min = min;
     this.markAsNotEmpty();
   }
-}
\ No newline at end of file
+}

http://git-wip-us.apache.org/repos/asf/incubator-parquet-mr/blob/4bf9be34/parquet-column/src/main/java/parquet/column/statistics/Statistics.java
----------------------------------------------------------------------
diff --git a/parquet-column/src/main/java/parquet/column/statistics/Statistics.java b/parquet-column/src/main/java/parquet/column/statistics/Statistics.java
index b29b76b..7920e3e 100644
--- a/parquet-column/src/main/java/parquet/column/statistics/Statistics.java
+++ b/parquet-column/src/main/java/parquet/column/statistics/Statistics.java
@@ -28,11 +28,11 @@ import java.util.Arrays;
  */
 public abstract class Statistics<T extends Comparable<T>> {
 
-  private boolean firstValueAccountedFor;
+  private boolean hasNonNullValue;
   private long num_nulls;
 
   public Statistics() {
-    firstValueAccountedFor = false;
+    hasNonNullValue = false;
     num_nulls = 0;
   }
 
@@ -142,7 +142,10 @@ public abstract class Statistics<T extends Comparable<T>>
{
 
     if (this.getClass() == stats.getClass()) {
       incrementNumNulls(stats.getNumNulls());
-      mergeStatisticsMinMax(stats);
+      if (stats.hasNonNullValue()) {
+        mergeStatisticsMinMax(stats);
+        markAsNotEmpty();
+      }
     } else {
       throw new StatisticsClassException(this.getClass().toString(), stats.getClass().toString());
     }
@@ -220,11 +223,22 @@ public abstract class Statistics<T extends Comparable<T>>
{
    * @return true if object is empty, false otherwise
    */
   public boolean isEmpty() {
-    return !firstValueAccountedFor;
+    return !hasNonNullValue && num_nulls == 0;
   }
 
+  /**
+   * Returns whether there have been non-null values added to this statistics
+   */
+  public boolean hasNonNullValue() {
+    return hasNonNullValue;
+  }
+ 
+  /**
+   * Sets the page/column as having a valid non-null value
+   * kind of misnomer here
+   */ 
   protected void markAsNotEmpty() {
-    firstValueAccountedFor = true;
+    hasNonNullValue = true;
   }
 }
 

http://git-wip-us.apache.org/repos/asf/incubator-parquet-mr/blob/4bf9be34/parquet-hadoop/src/main/java/parquet/filter2/statisticslevel/StatisticsFilter.java
----------------------------------------------------------------------
diff --git a/parquet-hadoop/src/main/java/parquet/filter2/statisticslevel/StatisticsFilter.java
b/parquet-hadoop/src/main/java/parquet/filter2/statisticslevel/StatisticsFilter.java
index 4daed5a..02a22e9 100644
--- a/parquet-hadoop/src/main/java/parquet/filter2/statisticslevel/StatisticsFilter.java
+++ b/parquet-hadoop/src/main/java/parquet/filter2/statisticslevel/StatisticsFilter.java
@@ -67,11 +67,13 @@ public class StatisticsFilter implements FilterPredicate.Visitor<Boolean>
{
   }
 
   // is this column chunk composed entirely of nulls?
+  // assumes the column chunk's statistics is not empty
   private boolean isAllNulls(ColumnChunkMetaData column) {
     return column.getStatistics().getNumNulls() == column.getValueCount();
   }
 
   // are there any nulls in this column chunk?
+  // assumes the column chunk's statistics is not empty
   private boolean hasNulls(ColumnChunkMetaData column) {
     return column.getStatistics().getNumNulls() > 0;
   }
@@ -81,6 +83,12 @@ public class StatisticsFilter implements FilterPredicate.Visitor<Boolean>
{
     Column<T> filterColumn = eq.getColumn();
     T value = eq.getValue();
     ColumnChunkMetaData columnChunk = getColumnChunk(filterColumn.getColumnPath());
+    Statistics<T> stats = columnChunk.getStatistics();
+
+    if (stats.isEmpty()) {
+      // we have no statistics available, we cannot drop any chunks
+      return false;
+    }
 
     if (value == null) {
       // we are looking for records where v eq(null)
@@ -94,8 +102,6 @@ public class StatisticsFilter implements FilterPredicate.Visitor<Boolean>
{
       return true;
     }
 
-    Statistics<T> stats = columnChunk.getStatistics();
-
     // drop if value < min || value > max
     return value.compareTo(stats.genericGetMin()) < 0 || value.compareTo(stats.genericGetMax())
> 0;
   }
@@ -105,6 +111,12 @@ public class StatisticsFilter implements FilterPredicate.Visitor<Boolean>
{
     Column<T> filterColumn = notEq.getColumn();
     T value = notEq.getValue();
     ColumnChunkMetaData columnChunk = getColumnChunk(filterColumn.getColumnPath());
+    Statistics<T> stats = columnChunk.getStatistics();
+
+    if (stats.isEmpty()) {
+      // we have no statistics available, we cannot drop any chunks
+      return false;
+    }
 
     if (value == null) {
       // we are looking for records where v notEq(null)
@@ -118,8 +130,6 @@ public class StatisticsFilter implements FilterPredicate.Visitor<Boolean>
{
       return false;
     }
 
-    Statistics<T> stats = columnChunk.getStatistics();
-
     // drop if this is a column where min = max = value
     return value.compareTo(stats.genericGetMin()) == 0 && value.compareTo(stats.genericGetMax())
== 0;
   }
@@ -129,6 +139,12 @@ public class StatisticsFilter implements FilterPredicate.Visitor<Boolean>
{
     Column<T> filterColumn = lt.getColumn();
     T value = lt.getValue();
     ColumnChunkMetaData columnChunk = getColumnChunk(filterColumn.getColumnPath());
+    Statistics<T> stats = columnChunk.getStatistics();
+
+    if (stats.isEmpty()) {
+      // we have no statistics available, we cannot drop any chunks
+      return false;
+    }
 
     if (isAllNulls(columnChunk)) {
       // we are looking for records where v < someValue
@@ -136,8 +152,6 @@ public class StatisticsFilter implements FilterPredicate.Visitor<Boolean>
{
       return true;
     }
 
-    Statistics<T> stats = columnChunk.getStatistics();
-
     // drop if value <= min
     return  value.compareTo(stats.genericGetMin()) <= 0;
   }
@@ -147,6 +161,12 @@ public class StatisticsFilter implements FilterPredicate.Visitor<Boolean>
{
     Column<T> filterColumn = ltEq.getColumn();
     T value = ltEq.getValue();
     ColumnChunkMetaData columnChunk = getColumnChunk(filterColumn.getColumnPath());
+    Statistics<T> stats = columnChunk.getStatistics();
+
+    if (stats.isEmpty()) {
+      // we have no statistics available, we cannot drop any chunks
+      return false;
+    }
 
     if (isAllNulls(columnChunk)) {
       // we are looking for records where v <= someValue
@@ -154,8 +174,6 @@ public class StatisticsFilter implements FilterPredicate.Visitor<Boolean>
{
       return true;
     }
 
-    Statistics<T> stats = columnChunk.getStatistics();
-
     // drop if value < min
     return value.compareTo(stats.genericGetMin()) < 0;
   }
@@ -165,6 +183,12 @@ public class StatisticsFilter implements FilterPredicate.Visitor<Boolean>
{
     Column<T> filterColumn = gt.getColumn();
     T value = gt.getValue();
     ColumnChunkMetaData columnChunk = getColumnChunk(filterColumn.getColumnPath());
+    Statistics<T> stats = columnChunk.getStatistics();
+
+    if (stats.isEmpty()) {
+      // we have no statistics available, we cannot drop any chunks
+      return false;
+    }
 
     if (isAllNulls(columnChunk)) {
       // we are looking for records where v > someValue
@@ -172,8 +196,6 @@ public class StatisticsFilter implements FilterPredicate.Visitor<Boolean>
{
       return true;
     }
 
-    Statistics<T> stats = columnChunk.getStatistics();
-
     // drop if value >= max
     return value.compareTo(stats.genericGetMax()) >= 0;
   }
@@ -183,6 +205,12 @@ public class StatisticsFilter implements FilterPredicate.Visitor<Boolean>
{
     Column<T> filterColumn = gtEq.getColumn();
     T value = gtEq.getValue();
     ColumnChunkMetaData columnChunk = getColumnChunk(filterColumn.getColumnPath());
+    Statistics<T> stats = columnChunk.getStatistics();
+
+    if (stats.isEmpty()) {
+      // we have no statistics available, we cannot drop any chunks
+      return false;
+    }
 
     if (isAllNulls(columnChunk)) {
       // we are looking for records where v >= someValue
@@ -190,8 +218,6 @@ public class StatisticsFilter implements FilterPredicate.Visitor<Boolean>
{
       return true;
     }
 
-    Statistics<T> stats = columnChunk.getStatistics();
-
     // drop if value >= max
     return value.compareTo(stats.genericGetMax()) > 0;
   }
@@ -221,6 +247,19 @@ public class StatisticsFilter implements FilterPredicate.Visitor<Boolean>
{
     ColumnChunkMetaData columnChunk = getColumnChunk(filterColumn.getColumnPath());
     U udp = ud.getUserDefinedPredicate();
     Statistics<T> stats = columnChunk.getStatistics();
+
+    if (stats.isEmpty()) {
+      // we have no statistics available, we cannot drop any chunks
+      return false;
+    }
+
+    if (isAllNulls(columnChunk)) {
+      // there is no min max, there is nothing
+      // else we can say about this chunk, we
+      // cannot drop it.
+      return false;
+    }
+
     parquet.filter2.predicate.Statistics<T> udpStats =
         new parquet.filter2.predicate.Statistics<T>(stats.genericGetMin(), stats.genericGetMax());
 

http://git-wip-us.apache.org/repos/asf/incubator-parquet-mr/blob/4bf9be34/parquet-hadoop/src/main/java/parquet/format/converter/ParquetMetadataConverter.java
----------------------------------------------------------------------
diff --git a/parquet-hadoop/src/main/java/parquet/format/converter/ParquetMetadataConverter.java
b/parquet-hadoop/src/main/java/parquet/format/converter/ParquetMetadataConverter.java
index 198b654..b43429b 100644
--- a/parquet-hadoop/src/main/java/parquet/format/converter/ParquetMetadataConverter.java
+++ b/parquet-hadoop/src/main/java/parquet/format/converter/ParquetMetadataConverter.java
@@ -234,9 +234,11 @@ public class ParquetMetadataConverter {
   public static Statistics toParquetStatistics(parquet.column.statistics.Statistics statistics)
{
     Statistics stats = new Statistics();
     if (!statistics.isEmpty()) {
-      stats.setMax(statistics.getMaxBytes());
-      stats.setMin(statistics.getMinBytes());
       stats.setNull_count(statistics.getNumNulls());
+      if(statistics.hasNonNullValue()) {
+        stats.setMax(statistics.getMaxBytes());
+        stats.setMin(statistics.getMinBytes());
+     }
     }
     return stats;
   }
@@ -246,7 +248,9 @@ public class ParquetMetadataConverter {
     parquet.column.statistics.Statistics stats = parquet.column.statistics.Statistics.getStatsBasedOnType(type);
     // If there was no statistics written to the footer, create an empty Statistics object
and return
     if (statistics != null) {
-      stats.setMinMaxFromBytes(statistics.min.array(), statistics.max.array());
+      if (statistics.isSetMax() && statistics.isSetMin()) {
+        stats.setMinMaxFromBytes(statistics.min.array(), statistics.max.array());
+      }
       stats.setNumNulls(statistics.null_count);
     }
     return stats;

http://git-wip-us.apache.org/repos/asf/incubator-parquet-mr/blob/4bf9be34/parquet-hadoop/src/test/java/parquet/filter2/statisticslevel/TestStatisticsFilter.java
----------------------------------------------------------------------
diff --git a/parquet-hadoop/src/test/java/parquet/filter2/statisticslevel/TestStatisticsFilter.java
b/parquet-hadoop/src/test/java/parquet/filter2/statisticslevel/TestStatisticsFilter.java
index 4e75b20..b7ac931 100644
--- a/parquet-hadoop/src/test/java/parquet/filter2/statisticslevel/TestStatisticsFilter.java
+++ b/parquet-hadoop/src/test/java/parquet/filter2/statisticslevel/TestStatisticsFilter.java
@@ -279,10 +279,10 @@ public class TestStatisticsFilter {
   @Test
   public void testClearExceptionForNots() {
     List<ColumnChunkMetaData> columnMetas = Arrays.asList(
-        getIntColumnMeta(new IntStatistics(), 0L),
-        getDoubleColumnMeta(new DoubleStatistics(), 0L));
+        getDoubleColumnMeta(new DoubleStatistics(), 0L),
+        getIntColumnMeta(new IntStatistics(), 0L));
 
-    FilterPredicate pred = and(eq(intColumn, 17), not(eq(doubleColumn, 12.0)));
+    FilterPredicate pred = and(not(eq(doubleColumn, 12.0)), eq(intColumn, 17));
 
     try {
       canDrop(pred, columnMetas);
@@ -297,7 +297,7 @@ public class TestStatisticsFilter {
   public void testMissingColumn() {
     List<ColumnChunkMetaData> columnMetas = Arrays.asList(getIntColumnMeta(new IntStatistics(),
0L));
     try {
-      canDrop(and(eq(intColumn, 17), eq(doubleColumn, 12.0)), columnMetas);
+      canDrop(and(eq(doubleColumn, 12.0), eq(intColumn, 17)), columnMetas);
       fail("This should throw");
     } catch (IllegalArgumentException e) {
       assertEquals("Column double.column not found in schema!", e.getMessage());

http://git-wip-us.apache.org/repos/asf/incubator-parquet-mr/blob/4bf9be34/parquet-hadoop/src/test/java/parquet/hadoop/TestParquetFileWriter.java
----------------------------------------------------------------------
diff --git a/parquet-hadoop/src/test/java/parquet/hadoop/TestParquetFileWriter.java b/parquet-hadoop/src/test/java/parquet/hadoop/TestParquetFileWriter.java
index 1d45469..5d0b17f 100644
--- a/parquet-hadoop/src/test/java/parquet/hadoop/TestParquetFileWriter.java
+++ b/parquet-hadoop/src/test/java/parquet/hadoop/TestParquetFileWriter.java
@@ -62,8 +62,14 @@ import parquet.schema.PrimitiveType.PrimitiveTypeName;
 import parquet.format.Statistics;
 import parquet.format.converter.ParquetMetadataConverter;
 
+import parquet.example.data.Group;
+import parquet.example.data.simple.SimpleGroup;
+
+import parquet.hadoop.example.GroupWriteSupport;
+
 public class TestParquetFileWriter {
   private static final Log LOG = Log.getLog(TestParquetFileWriter.class);
+  private String writeSchema;
 
   @Test
   public void testWriteRead() throws Exception {
@@ -310,6 +316,36 @@ public class TestParquetFileWriter {
 
   }
 
+  @Test
+  public void testWriteReadStatisticsAllNulls() throws Exception {
+
+    File testFile = new File("target/test/TestParquetFileWriter/testParquetFile").getAbsoluteFile();
+    testFile.delete();
+
+    writeSchema = "message example {\n" +
+            "required binary content;\n" +
+            "}";
+
+    Path path = new Path(testFile.toURI());
+
+    MessageType schema = MessageTypeParser.parseMessageType(writeSchema);
+    Configuration configuration = new Configuration();
+    GroupWriteSupport.setSchema(schema, configuration);
+
+    ParquetWriter<Group> writer = new ParquetWriter<Group>(path, configuration,
new GroupWriteSupport());
+   
+    Group r1 = new SimpleGroup(schema);
+    writer.write(r1);
+    writer.close();
+    
+    ParquetMetadata readFooter = ParquetFileReader.readFooter(configuration, path);
+    
+    // assert the statistics object is not empty
+    assertTrue((readFooter.getBlocks().get(0).getColumns().get(0).getStatistics().isEmpty())
== false);
+    // assert the number of nulls are correct for the first block
+    assertEquals(1, (readFooter.getBlocks().get(0).getColumns().get(0).getStatistics().getNumNulls()));
+  }
+
   private void validateFooters(final List<Footer> metadata) {
     LOG.debug(metadata);
     assertEquals(String.valueOf(metadata), 3, metadata.size());


Mime
View raw message