parquet-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From alexleven...@apache.org
Subject [2/2] parquet-mr git commit: PARQUET-229 Add a strict thrift projection API with backwards compat support
Date Fri, 01 May 2015 00:45:13 GMT
PARQUET-229 Add a strict thrift projection API with backwards compat support

Currently, the thrift projection API accepts strings in a very general glob format that supports not only wildcards like `*` and `?` and expansions (`{x,y,z}`) but also character classes `[abc]`, and negation.
Because of this flexibility, it's hard to give users good error reporting, for example letting them know that when they requested columns `foo.bar.{a,b,c}` there is actually no such column `foo.bar.c`.

This PR introduces a new syntax that supports a more restrictive form of glob syntax and enforces that all **expansions** of a glob match a column, not just that all globs match a column. The new syntax is very simple and only has four special characters: `{`,`}`,`,`, and `*`

It supports glob expansion, for example:
`home.{phone,address}` or `org.apache{-incubator,}.foo`

And the wildcard `*` which is treated the same way as java regex treats `(.*)`, for example:
`home.*` or `org.apache*.foo`

In the new syntax glob paths mean "keep all the child fields of the field matched by this glob", just like variable access would work in a programming language. For example: `x.y.z` means keep field `z` and all of its children (if any). So it's not necessary to do `x.y.z.*`. However, `x.y.z` would not keep field `x.y.zoo`. If that was desired, then `x.y.z*` could be used instead.

Setting `"parquet.thrift.column.filter"` will result in the same behavior that it does currently in master, though a deprecation warning will be logged. The classes that implement the current behavior have been marked as deprecated, and using this will log a warning.

Setting `"parquet.thrift.column.projection.globs"` will instead use this new syntax, and entry points in the various Builder's in the codebase is added as well.

This PR does a little bit of cleanup as well, moving some shared methods to a `Strings` class and simplifying some of the class hierarchy in `ThriftSchemaConverterVisitor`. There are a few `// TODO Why?` added as well that I wanted to ask about.

Author: Alex Levenson <alexlevenson@twitter.com>

Closes #150 from isnotinvain/alexlevenson/strict-projection and squashes the following commits:

6c58e1c [Alex Levenson] clean up docs
1aab666 [Alex Levenson] Merge branch 'master' into alexlevenson/strict-projection
92b6ba6 [Alex Levenson] Merge branch 'master' into alexlevenson/strict-projection
ceaf6cd [Alex Levenson] update packages
a28dc19 [Alex Levenson] Merge branch 'master' into alexlevenson/strict-projection
ebc4761 [Alex Levenson] Remove unneeded TODO
c2e12c5 [Alex Levenson] Update docs
eecf5f3 [Alex Levenson] Merge branch 'master' into alexlevenson/strict-projection
671f0b5 [Alex Levenson] Merge branch 'master' into alexlevenson/strict-projection
298cad8 [Alex Levenson] Add warning
8b7e4bb [Alex Levenson] Add more comments to StrictFieldProjectionFilter
8f65ed2 [Alex Levenson] Add tests for strict projection filter
c81d9e1 [Alex Levenson] Docs and cleanup for FieldProjectionFilter
71139a7 [Alex Levenson] Add tests for FieldsPath
7d17068 [Alex Levenson] Tests for WildcardPath
8a3d2af [Alex Levenson] Add some tests
f3fd931 [Alex Levenson] More docs
0b190c3 [Alex Levenson] Add more comments
6e67df5 [Alex Levenson] Add a strict thrift projection API with backwards support for the current API


Project: http://git-wip-us.apache.org/repos/asf/parquet-mr/repo
Commit: http://git-wip-us.apache.org/repos/asf/parquet-mr/commit/7fc79983
Tree: http://git-wip-us.apache.org/repos/asf/parquet-mr/tree/7fc79983
Diff: http://git-wip-us.apache.org/repos/asf/parquet-mr/diff/7fc79983

Branch: refs/heads/master
Commit: 7fc7998398373a14b4cdc0ce18abdeb221b1ccf9
Parents: 22c6d08
Author: Alex Levenson <alexlevenson@twitter.com>
Authored: Thu Apr 30 17:45:11 2015 -0700
Committer: Alex Levenson <alexlevenson@twitter.com>
Committed: Thu Apr 30 17:45:11 2015 -0700

----------------------------------------------------------------------
 .../parquet/cascading/ParquetValueScheme.java   |  42 +++-
 .../main/java/org/apache/parquet/Strings.java   | 110 +++++++++
 .../org/apache/parquet/glob/GlobExpander.java   | 114 ++++++++++
 .../java/org/apache/parquet/glob/GlobNode.java  | 157 +++++++++++++
 .../org/apache/parquet/glob/GlobParser.java     | 224 +++++++++++++++++++
 .../org/apache/parquet/glob/WildcardPath.java   | 122 ++++++++++
 .../parquet/hadoop/metadata/ColumnPath.java     |  12 +-
 .../java/org/apache/parquet/glob/TestGlob.java  | 144 ++++++++++++
 .../apache/parquet/glob/TestWildcardPath.java   | 125 +++++++++++
 .../hadoop/thrift/ThriftReadSupport.java        |  78 +++++--
 .../thrift/ThriftSchemaConvertVisitor.java      |  36 ++-
 .../parquet/thrift/ThriftSchemaConverter.java   |  69 +++---
 .../projection/FieldProjectionFilter.java       |  92 +++-----
 .../parquet/thrift/projection/FieldsPath.java   |  61 ++---
 .../thrift/projection/PathGlobPattern.java      | 184 ---------------
 .../projection/StrictFieldProjectionFilter.java | 182 +++++++++++++++
 .../DeprecatedFieldProjectionFilter.java        | 107 +++++++++
 .../projection/deprecated/PathGlobPattern.java  | 187 ++++++++++++++++
 ...stParquetToThriftReadWriteAndProjection.java |   2 +-
 .../thrift/TestThriftSchemaConverter.java       |  73 +++---
 .../thrift/projection/PathGlobPatternTest.java  |  59 -----
 .../thrift/projection/TestFieldsPath.java       | 166 ++++++++++++++
 .../TestStrictFieldProjectionFilter.java        | 119 ++++++++++
 .../deprecated/PathGlobPatternTest.java         |  59 +++++
 parquet_cascading.md                            |  71 +++---
 pom.xml                                         |   3 +
 26 files changed, 2098 insertions(+), 500 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/parquet-mr/blob/7fc79983/parquet-cascading/src/main/java/org/apache/parquet/cascading/ParquetValueScheme.java
----------------------------------------------------------------------
diff --git a/parquet-cascading/src/main/java/org/apache/parquet/cascading/ParquetValueScheme.java b/parquet-cascading/src/main/java/org/apache/parquet/cascading/ParquetValueScheme.java
index 33d1bb7..9549ef4 100644
--- a/parquet-cascading/src/main/java/org/apache/parquet/cascading/ParquetValueScheme.java
+++ b/parquet-cascading/src/main/java/org/apache/parquet/cascading/ParquetValueScheme.java
@@ -51,17 +51,21 @@ public abstract class ParquetValueScheme<T> extends Scheme<JobConf, RecordReader
 
   public static final class Config<T> implements Serializable {
     private final FilterPredicate filterPredicate;
-    private final String projectionString;
+    private final String deprecatedProjectionString;
+    private final String strictProjectionString;
     private final Class<T> klass;
-    private Config(Class<T> klass, FilterPredicate filterPredicate, String projectionString) {
+
+    private Config(Class<T> klass, FilterPredicate filterPredicate, String deprecatedProjectionString, String strictProjectionString) {
       this.filterPredicate = filterPredicate;
-      this.projectionString = projectionString;
+      this.deprecatedProjectionString = deprecatedProjectionString;
+      this.strictProjectionString = strictProjectionString;
       this.klass = klass;
     }
 
     public Config() {
       filterPredicate = null;
-      projectionString = null;
+      deprecatedProjectionString = null;
+      strictProjectionString = null;
       klass = null;
     }
 
@@ -69,8 +73,13 @@ public abstract class ParquetValueScheme<T> extends Scheme<JobConf, RecordReader
       return filterPredicate;
     }
 
+    @Deprecated
     public String getProjectionString() {
-      return projectionString;
+      return deprecatedProjectionString;
+    }
+
+    public String getStrictProjectionString() {
+      return strictProjectionString;
     }
 
     public Class<T> getKlass() {
@@ -78,15 +87,20 @@ public abstract class ParquetValueScheme<T> extends Scheme<JobConf, RecordReader
     }
 
     public Config<T> withFilterPredicate(FilterPredicate f) {
-      return new Config<T>(this.klass, checkNotNull(f, "filterPredicate"), this.projectionString);
+      return new Config<T>(this.klass, checkNotNull(f, "filterPredicate"), this.deprecatedProjectionString, this.strictProjectionString);
     }
 
+    @Deprecated
     public Config<T> withProjectionString(String p) {
-      return new Config<T>(this.klass, this.filterPredicate, checkNotNull(p, "projectionFilter"));
+      return new Config<T>(this.klass, this.filterPredicate, checkNotNull(p, "projectionString"), this.strictProjectionString);
+    }
+
+    public Config<T> withStrictProjectionString(String p) {
+      return new Config<T>(this.klass, this.filterPredicate, this.deprecatedProjectionString, checkNotNull(p, "projectionString"));
     }
 
     public Config<T> withRecordClass(Class<T> klass) {
-      return new Config<T>(checkNotNull(klass, "recordClass"), this.filterPredicate, this.projectionString);
+      return new Config<T>(checkNotNull(klass, "recordClass"), this.filterPredicate, this.deprecatedProjectionString, this.strictProjectionString);
     }
   }
 
@@ -105,9 +119,16 @@ public abstract class ParquetValueScheme<T> extends Scheme<JobConf, RecordReader
     this.config = config;
   }
 
+  @Deprecated
   private void setProjectionPushdown(JobConf jobConf) {
-    if (this.config.projectionString!= null) {
-      ThriftReadSupport.setProjectionPushdown(jobConf, this.config.projectionString);
+    if (this.config.deprecatedProjectionString != null) {
+      ThriftReadSupport.setProjectionPushdown(jobConf, this.config.deprecatedProjectionString);
+    }
+  }
+
+  private void setStrictProjectionPushdown(JobConf jobConf) {
+    if (this.config.strictProjectionString != null) {
+      ThriftReadSupport.setStrictFieldProjectionFilter(jobConf, this.config.strictProjectionString);
     }
   }
 
@@ -120,6 +141,7 @@ public abstract class ParquetValueScheme<T> extends Scheme<JobConf, RecordReader
   public void sourceConfInit(FlowProcess<JobConf> jobConfFlowProcess, Tap<JobConf, RecordReader, OutputCollector> jobConfRecordReaderOutputCollectorTap, final JobConf jobConf) {
     setPredicatePushdown(jobConf);
     setProjectionPushdown(jobConf);
+    setStrictProjectionPushdown(jobConf);
     setRecordClass(jobConf);
   }
 

http://git-wip-us.apache.org/repos/asf/parquet-mr/blob/7fc79983/parquet-common/src/main/java/org/apache/parquet/Strings.java
----------------------------------------------------------------------
diff --git a/parquet-common/src/main/java/org/apache/parquet/Strings.java b/parquet-common/src/main/java/org/apache/parquet/Strings.java
new file mode 100644
index 0000000..4d9fcc9
--- /dev/null
+++ b/parquet-common/src/main/java/org/apache/parquet/Strings.java
@@ -0,0 +1,110 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.parquet;
+
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Iterator;
+import java.util.List;
+
+import org.apache.parquet.glob.GlobExpander;
+import org.apache.parquet.glob.WildcardPath;
+
+public final class Strings {
+  private Strings() { }
+
+  /**
+   * Join an Iterable of Strings into a single string with a delimiter.
+   * For example, join(Arrays.asList("foo","","bar","x"), "|") would return
+   * "foo||bar|x"
+   *
+   * @param s an iterable of strings
+   * @param on the delimiter
+   * @return a single joined string
+   */
+  public static String join(Iterable<String> s, String on) {
+    Iterator<String> iter = s.iterator();
+    StringBuilder sb = new StringBuilder();
+    while (iter.hasNext()) {
+      sb.append(iter.next());
+      if (iter.hasNext()) {
+        sb.append(on);
+      }
+    }
+    return sb.toString();
+  }
+
+  /**
+   * Join an Array of Strings into a single string with a delimiter.
+   * For example, join(new String[] {"foo","","bar","x"}, "|") would return
+   * "foo||bar|x"
+   *
+   * @param s an iterable of strings
+   * @param on the delimiter
+   * @return a single joined string
+   */
+  public static String join(String[] s, String on) {
+    return join(Arrays.asList(s), on);
+  }
+
+  /**
+   * Returns true if s.isEmpty() or s == null
+   */
+  public static boolean isNullOrEmpty(String s) {
+    return s == null || s.isEmpty();
+  }
+
+  /**
+   * Expands a string with braces ("{}") into all of its possible permutations.
+   * We call anything inside of {} braces a "one-of" group.
+   *
+   * The only special characters in this glob syntax are '}', '{' and ','
+   *
+   * The top-level pattern must not contain any commas, but a "one-of" group separates
+   * its elements with commas, and a one-of group may contain sub one-of groups.
+   *
+   * For example:
+   * start{a,b,c}end -> startaend, startbend, startcend
+   * start{a,{b,c},d} -> startaend, startbend, startcend, startdend
+   * {a,b,c} -> a, b, c
+   * start{a, b{x,y}} -> starta, startbx, startby
+   *
+   * @param globPattern a string in the format described above
+   * @return a list of all the strings that would satisfy globPattern, including duplicates
+   */
+  public static List<String> expandGlob(String globPattern) {
+    return GlobExpander.expand(globPattern);
+  }
+
+  /**
+   * Expands a string according to {@link #expandGlob(String)}, and then constructs a {@link WildcardPath}
+   * for each expanded result which can be used to match strings as described in {@link WildcardPath}.
+   *
+   * @param globPattern a String to be passed to {@link #expandGlob(String)}
+   * @param delim the delimeter used by {@link WildcardPath}
+   */
+  public static List<WildcardPath> expandGlobToWildCardPaths(String globPattern, char delim) {
+    List<WildcardPath> ret = new ArrayList<WildcardPath>();
+    for (String expandedGlob : Strings.expandGlob(globPattern)) {
+      ret.add(new WildcardPath(globPattern, expandedGlob, delim));
+    }
+    return ret;
+  }
+}

http://git-wip-us.apache.org/repos/asf/parquet-mr/blob/7fc79983/parquet-common/src/main/java/org/apache/parquet/glob/GlobExpander.java
----------------------------------------------------------------------
diff --git a/parquet-common/src/main/java/org/apache/parquet/glob/GlobExpander.java b/parquet-common/src/main/java/org/apache/parquet/glob/GlobExpander.java
new file mode 100644
index 0000000..79b633c
--- /dev/null
+++ b/parquet-common/src/main/java/org/apache/parquet/glob/GlobExpander.java
@@ -0,0 +1,114 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.parquet.glob;
+
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+
+import org.apache.parquet.glob.GlobNode.Atom;
+import org.apache.parquet.glob.GlobNode.GlobNodeSequence;
+import org.apache.parquet.glob.GlobNode.OneOf;
+
+/**
+ * Implementation of {@link org.apache.parquet.Strings#expandGlob(String)}
+ */
+public final class GlobExpander {
+  private GlobExpander()  { }
+
+  /**
+   * See {@link org.apache.parquet.Strings#expandGlob(String)} for docs.
+   */
+  public static List<String> expand(String globPattern) {
+    return GlobExpanderImpl.expand(GlobParser.parse(globPattern));
+  }
+
+  /**
+   * Transforms a tree of {@link GlobNode} into a list of all the strings that satisfy
+   * this tree.
+   */
+  private final static class GlobExpanderImpl implements GlobNode.Visitor<List<String>> {
+    private static final GlobExpanderImpl INSTANCE = new GlobExpanderImpl();
+
+    private GlobExpanderImpl() {}
+
+    public static List<String> expand(GlobNode node) {
+      return node.accept(INSTANCE);
+    }
+
+    @Override
+    public List<String> visit(Atom atom) {
+      // atoms are the base case, just return a singleton list
+      return Arrays.asList(atom.get());
+    }
+
+    @Override
+    public List<String> visit(OneOf oneOf) {
+      // in the case of OneOf, we just need to take all of
+      // the possible values the OneOf represents and
+      // union them together
+      List<String> results = new ArrayList<String>();
+      for (GlobNode n : oneOf.getChildren()) {
+        results.addAll(n.accept(this));
+      }
+      return results;
+    }
+
+    @Override
+    public List<String> visit(GlobNodeSequence seq) {
+      // in the case of a sequence, for each child
+      // we need to expand the child into all of its
+      // possibilities, then do a cross product of
+      // all the children, in order.
+
+      List<String> results = new ArrayList<String>();
+      for (GlobNode n : seq.getChildren()) {
+        results = crossOrTakeNonEmpty(results, n.accept(this));
+      }
+      return results;
+    }
+
+    /**
+     * Computes the cross product of two lists by adding each string in list1 to each string in list2.
+     * If one of the lists is empty, a copy of the other list is returned.
+     * If both are empty, an empty list is returned.
+     */
+    public static List<String> crossOrTakeNonEmpty(List<String> list1, List<String> list2) {
+      if (list1.isEmpty()) {
+        ArrayList<String> result = new ArrayList<String>(list2.size());
+        result.addAll(list2);
+        return result;
+      }
+
+      if (list2.isEmpty()) {
+        ArrayList<String> result = new ArrayList<String>(list1.size());
+        result.addAll(list1);
+        return result;
+      }
+
+      List<String> result = new ArrayList<String>(list1.size() * list2.size());
+      for (String s1 : list1) {
+        for (String s2 : list2) {
+          result.add(s1 + s2);
+        }
+      }
+      return result;
+    }
+  }
+}
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/parquet-mr/blob/7fc79983/parquet-common/src/main/java/org/apache/parquet/glob/GlobNode.java
----------------------------------------------------------------------
diff --git a/parquet-common/src/main/java/org/apache/parquet/glob/GlobNode.java b/parquet-common/src/main/java/org/apache/parquet/glob/GlobNode.java
new file mode 100644
index 0000000..c63c780
--- /dev/null
+++ b/parquet-common/src/main/java/org/apache/parquet/glob/GlobNode.java
@@ -0,0 +1,157 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.parquet.glob;
+
+import java.util.List;
+
+/**
+ * A GlobNode represents a tree structure for describing a parsed glob pattern.
+ *
+ * GlobNode uses the visitor pattern for tree traversal.
+ *
+ * See {@link org.apache.parquet.Strings#expandGlob(String)}
+ */
+interface GlobNode {
+  <R> R accept(Visitor<R> visitor);
+
+  static interface Visitor<T> {
+    T visit(Atom atom);
+    T visit(OneOf oneOf);
+    T visit(GlobNodeSequence seq);
+  }
+
+  /**
+   * An Atom is just a String, it's a concrete String that is either part
+   * of the top-level pattern, or one of the choices in a OneOf clause, or an
+   * element in a GlobNodeSequence. In this sense it's the base case or leaf node
+   * of a GlobNode tree.
+   *
+   * For example, in pre{x,y{a,b}}post pre, x, y, z, b, and post are all Atoms.
+   */
+  static class Atom implements GlobNode {
+    private final String s;
+
+    public Atom(String s) {
+      this.s = s;
+    }
+
+    public String get() {
+      return s;
+    }
+
+    @Override
+    public boolean equals(Object o) {
+      if (this == o) return true;
+      return getClass() == o.getClass() && s.equals(((Atom) o).s);
+    }
+
+    @Override
+    public int hashCode() {
+      return s.hashCode();
+    }
+
+    @Override
+    public String toString() {
+      return "Atom(" + s + ")";
+    }
+
+    @Override
+    public <R> R accept(Visitor<R> visitor) {
+      return visitor.visit(this);
+    }
+  }
+
+  /**
+   * A OneOf represents a {} clause in a glob pattern, which means
+   * "one of the elements of this set must be satisfied", for example:
+   * in pre{x,y} {x,y} is a OneOf, and in  or pre{x, {a,b}}post both {x, {a,b}}
+   * and {a,b} are OneOfs.
+   */
+  static class OneOf implements GlobNode {
+    private final List<GlobNode> children;
+
+    public OneOf(List<GlobNode> children) {
+      this.children = children;
+    }
+
+    public List<GlobNode> getChildren() {
+      return children;
+    }
+
+    @Override
+    public boolean equals(Object o) {
+      if (this == o) return true;
+      return getClass() == o.getClass() && children.equals(((OneOf) o).children);
+    }
+
+    @Override
+    public int hashCode() {
+      return children.hashCode();
+    }
+
+    @Override
+    public String toString() {
+      return "OneOf" + children;
+    }
+
+    @Override
+    public <R> R accept(Visitor<R> visitor) {
+      return visitor.visit(this);
+    }
+  }
+
+  /**
+   * A GlobNodeSequence is an ordered collection of GlobNodes that must be satisfied in order,
+   * and represents structures like pre{x,y}post or {x,y}{a,b}. In {test, pre{x,y}post}, pre{x,y}post is a
+   * GlobNodeSequence. Unlike a OneOf, GlobNodeSequence's children have an ordering that is meaningful and
+   * the requirements of its children must each be satisfied.
+   */
+  static class GlobNodeSequence implements GlobNode {
+    private final List<GlobNode> children;
+
+    public GlobNodeSequence(List<GlobNode> children) {
+      this.children = children;
+    }
+
+    public List<GlobNode> getChildren() {
+      return children;
+    }
+
+    @Override
+    public boolean equals(Object o) {
+      if (this == o) return true;
+      return getClass() == o.getClass() && children.equals(((OneOf) o).children);
+    }
+
+    @Override
+    public int hashCode() {
+      return children.hashCode();
+    }
+
+    @Override
+    public String toString() {
+      return "GlobNodeSequence" + children;
+    }
+
+    @Override
+    public <R> R accept(Visitor<R> visitor) {
+      return visitor.visit(this);
+    }
+  }
+}

http://git-wip-us.apache.org/repos/asf/parquet-mr/blob/7fc79983/parquet-common/src/main/java/org/apache/parquet/glob/GlobParser.java
----------------------------------------------------------------------
diff --git a/parquet-common/src/main/java/org/apache/parquet/glob/GlobParser.java b/parquet-common/src/main/java/org/apache/parquet/glob/GlobParser.java
new file mode 100644
index 0000000..d49bd52
--- /dev/null
+++ b/parquet-common/src/main/java/org/apache/parquet/glob/GlobParser.java
@@ -0,0 +1,224 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.parquet.glob;
+
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+
+import org.apache.parquet.glob.GlobNode.Atom;
+import org.apache.parquet.glob.GlobNode.GlobNodeSequence;
+import org.apache.parquet.glob.GlobNode.OneOf;
+
+final class GlobParser {
+  private GlobParser() { }
+
+  /**
+   * Parse a String into a {@link GlobNodeSequence}
+   *
+   * See {@link org.apache.parquet.Strings#expandGlob(String)}
+   */
+  public static GlobNodeSequence parse(String pattern) {
+    /*
+     * The parse algorithm works as follows, assuming we are parsing:
+     * "apache{one,pre{x,y}post,two}parquet{a,b}"
+     *
+     * 1) Begin scanning the string until we find the first {
+     *
+     * 2) Now that we've found the beginning of a glob group, scan forwards
+     *    until the end of this glob group (by counting { and } we see until we find
+     *    the closing } for the group we found in step 1).
+     *
+     * 3) Once the matching closing } is found we need to do two things. First, everything
+     *    from the end of the last group up to start of this group is an Atom, so in the example
+     *    above, once we've found that "{one,pre{x,y}post,two}" is the first group, we need to grab
+     *    "apache" and treat it as an atom and add it to our sequence.
+     *    Then, we parse "{one,pre{x,y}post,two}" using a similar but slightly different function (parseOneOf)
+     *    and add the result from that to our sequence.
+     *
+     * 4) Repeat until the end of the string -- so next we find {a,b} and add "parquet" as an Atom and parse
+     *    {a,b} using parseOneOf.
+     */
+
+    if (pattern.isEmpty() || pattern.equals("{}")) {
+      return new GlobNodeSequence(Arrays.<GlobNode>asList(new Atom("")));
+    }
+
+    // the outer parse method needs to parse the pattern into a
+    // GlobNodeSequence, though it may end up being a singleton sequence
+    List<GlobNode> children = new ArrayList<GlobNode>();
+
+    int unmatchedBraces = 0; // count of unmatched braces
+    int firstBrace = 0; // open brace of current group being processsed
+    int anchor = 0; // first un-parsed character position
+
+    for (int i = 0; i < pattern.length(); i++) {
+      char c = pattern.charAt(i);
+
+      switch (c) {
+        case ',':
+          if (unmatchedBraces == 0) {
+            // commas not allowed in the top level expression
+            // TODO: maybe turn this check off?
+            throw new GlobParseException("Unexpected comma outside of a {} group:\n"
+                + annotateMessage(pattern, i));
+          }
+          break;
+        case '{':
+          if (unmatchedBraces == 0) {
+            // this is the first brace of an outermost {} group
+            firstBrace = i;
+          }
+          unmatchedBraces++;
+          break;
+        case '}':
+          unmatchedBraces--;
+          if (unmatchedBraces < 0) {
+            throw new GlobParseException("Unexpected closing }:\n"
+                + annotateMessage(pattern, i));
+          }
+          if (unmatchedBraces == 0) {
+            // grab everything from the end of the last group up to here,
+            // not including the close brace, it is an Atom in our sequence
+            // (assuming it's not empty)
+            if (anchor != firstBrace) {
+              // not empty!
+              // (substring's end param is exclusive)
+              children.add(new Atom(pattern.substring(anchor, firstBrace)));
+            }
+
+            // grab the group, parse it, add it to our sequence, and then continue
+            // note that we skip the braces on both sides (substring's end param is exclusive)
+            children.add(parseOneOf(pattern.substring(firstBrace + 1, i)));
+
+            // we have now parsed all the way up to here, the next un-parsed char is i + 1
+            anchor = i + 1;
+          }
+          break;
+      }
+    }
+
+    if (unmatchedBraces > 0) {
+      throw new GlobParseException("Not enough close braces in: " + pattern);
+    }
+
+    if (anchor != pattern.length()) {
+      // either there were no {} groups, or there were some characters after the
+      // last }, either way whatever is left (could be the entire input) is an Atom
+      // in our sequence
+      children.add(new Atom(pattern.substring(anchor, pattern.length())));
+    }
+
+    return new GlobNodeSequence(children);
+  }
+
+  private static OneOf parseOneOf(String pattern) {
+    /*
+     * This method is only called when parsing the inside of a {} expression.
+     * So in the example above, of calling parse("apache{one,pre{x,y}post,two}parquet{a,b}")
+     * this method will get called on first "one,pre{x,y}post,two", then on "x,y" and then on "a,b"
+     *
+     * The inside of a {} expression essentially means "one of these comma separated expressions".
+     * So this gets parsed slightly differently than the top level string passed to parse().
+     *
+     * The algorithm works as follows:
+     * 1) Split the string on ',' -- but only commas that are not inside of {} expressions
+     * 2) Each of the splits can be parsed via the parse() method above
+     * 3) Add all parsed splits to a single parent OneOf.
+     */
+
+    // this inner parse method needs to parse the pattern into a
+    // OneOf, though it may end up being a singleton OneOf
+    List<GlobNode> children = new ArrayList<GlobNode>();
+
+    int unmatchedBraces = 0; // count of unmatched braces
+    int anchor = 0; // first un-parsed character position
+
+    for (int i = 0; i < pattern.length(); i++) {
+      char c = pattern.charAt(i);
+
+      switch (c) {
+        case ',':
+          // only "split" on commas not nested inside of {}
+          if (unmatchedBraces == 0) {
+            // ok, this comma is not inside of a {}, so
+            // grab everything from anchor to here, parse it, and add it
+            // as one of the options in this OneOf
+            children.add(parse(pattern.substring(anchor, i)));
+
+            // we have now parsed up to this comma, the next un-parsed char is i + 1
+            anchor = i + 1;
+          }
+          break;
+        case '{':
+          unmatchedBraces++;
+          break;
+        case '}':
+          unmatchedBraces--;
+          if (unmatchedBraces < 0) {
+            throw new GlobParseException("Unexpected closing }:\n"
+                + annotateMessage(pattern, i));
+          }
+          break;
+      }
+    }
+
+    if (unmatchedBraces > 0) {
+      throw new GlobParseException("Not enough close braces in: " + pattern);
+    }
+
+    if (anchor != pattern.length()) {
+      // either there were no commas outside of {} groups, or there were some characters after the
+      // last comma, either way whatever is left (could be the entire input) is an Atom
+      // in our sequence
+      children.add(parse(pattern.substring(anchor, pattern.length())));
+    }
+
+    if (pattern.length() > 0 && pattern.charAt(pattern.length() - 1) == ',') {
+      // the above loop won't handle a trailing comma
+      children.add(parse(""));
+    }
+
+    return new OneOf(children);
+  }
+
+  // for pretty printing which character had the error
+  private static String annotateMessage(String message, int pos) {
+    StringBuilder sb = new StringBuilder(message);
+    sb.append('\n');
+    for (int i = 0; i < pos; i++) {
+      sb.append('-');
+    }
+    sb.append('^');
+    return sb.toString();
+  }
+
+  public static class GlobParseException extends RuntimeException {
+    public GlobParseException() {
+    }
+
+    public GlobParseException(String message) {
+      super(message);
+    }
+
+    public GlobParseException(String message, Throwable cause) {
+      super(message, cause);
+    }
+  }
+}

http://git-wip-us.apache.org/repos/asf/parquet-mr/blob/7fc79983/parquet-common/src/main/java/org/apache/parquet/glob/WildcardPath.java
----------------------------------------------------------------------
diff --git a/parquet-common/src/main/java/org/apache/parquet/glob/WildcardPath.java b/parquet-common/src/main/java/org/apache/parquet/glob/WildcardPath.java
new file mode 100644
index 0000000..e1e4bd3
--- /dev/null
+++ b/parquet-common/src/main/java/org/apache/parquet/glob/WildcardPath.java
@@ -0,0 +1,122 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.parquet.glob;
+
+import java.util.regex.Pattern;
+
+import org.apache.parquet.Preconditions;
+
+/**
+ * Holds a String with wildcards ('*'), and can answer whether a given string matches this WildcardPath.
+ * For example:
+ * "foo.*.baz" or "foo*baz.bar*"
+ *
+ * The '*' in "foo*bar" is treated the same way that java regex treats "(.*)",
+ * and all WildcardPath's are considered to match child paths.
+ * For example, "foo.bar" will match "foo.bar.baz". It will not match "foo.barbaz" however.
+ * To match "foo.barbaz" the pattern "foo.bar*" could be used, which would also match "foo.barbaz.x"
+ *
+ * Only '*' is  considered a special character.
+ * All other characters are not treated as special characters, including '{', '}', '.', and '/'
+ * with one exception -- the delimiter character is used for matching against child paths as explained above.
+ *
+ * It is assumed that {} globs have already been expanded before constructing
+ * this object.
+ */
+public class WildcardPath {
+  private static final String STAR_REGEX = "(.*)";
+  private static final String MORE_NESTED_FIELDS_TEMPLATE = "((%s).*)?";
+  private final String parentGlobPath;
+  private final String originalPattern;
+  private final Pattern pattern;
+
+  public WildcardPath(String parentGlobPath, String wildcardPath, char delim) {
+    this.parentGlobPath = Preconditions.checkNotNull(parentGlobPath, "parentGlobPath");
+    this.originalPattern = Preconditions.checkNotNull(wildcardPath, "wildcardPath");
+    this.pattern = Pattern.compile(buildRegex(wildcardPath, delim));
+  }
+
+  public static String buildRegex(String wildcardPath, char delim) {
+    if (wildcardPath.isEmpty()) {
+      return wildcardPath;
+    }
+
+    String delimStr = Pattern.quote(Character.toString(delim));
+
+    String[] splits = wildcardPath.split("\\*", -1); // -1 means keep trailing empty strings
+    StringBuilder regex = new StringBuilder();
+
+    for (int i = 0; i < splits.length; i++) {
+      if ((i == 0 || i == splits.length - 1) && splits[i].isEmpty()) {
+        // there was a * at the beginning or end of the string, so add a regex wildcard
+        regex.append(STAR_REGEX);
+        continue;
+      }
+
+      if (splits[i].isEmpty()) {
+        // means there was a double asterisk, we've already
+        // handled this just keep going.
+        continue;
+      }
+
+      // don't treat this part of the string as a regex, escape
+      // the entire thing
+      regex.append(Pattern.quote(splits[i]));
+
+      if (i < splits.length - 1) {
+        // this isn't the last split, so add a *
+        regex.append(STAR_REGEX);
+      }
+    }
+    // x.y.z should match "x.y.z" and also "x.y.z.foo.bar"
+    regex.append(String.format(MORE_NESTED_FIELDS_TEMPLATE, delimStr));
+    return regex.toString();
+  }
+
+  public boolean matches(String path) {
+    return pattern.matcher(path).matches();
+  }
+
+  public String getParentGlobPath() {
+    return parentGlobPath;
+  }
+
+  public String getOriginalPattern() {
+    return originalPattern;
+  }
+
+  @Override
+  public String toString() {
+    return String.format("WildcardPath(parentGlobPath: '%s', pattern: '%s')",
+        parentGlobPath, originalPattern);
+  }
+
+  @Override
+  public boolean equals(Object o) {
+    if (this == o) return true;
+    if (o == null || getClass() != o.getClass()) return false;
+    WildcardPath wildcardPath = (WildcardPath) o;
+    return originalPattern.equals(wildcardPath.originalPattern);
+  }
+
+  @Override
+  public int hashCode() {
+    return originalPattern.hashCode();
+  }
+}

http://git-wip-us.apache.org/repos/asf/parquet-mr/blob/7fc79983/parquet-common/src/main/java/org/apache/parquet/hadoop/metadata/ColumnPath.java
----------------------------------------------------------------------
diff --git a/parquet-common/src/main/java/org/apache/parquet/hadoop/metadata/ColumnPath.java b/parquet-common/src/main/java/org/apache/parquet/hadoop/metadata/ColumnPath.java
index a34cb91..4383b0f 100644
--- a/parquet-common/src/main/java/org/apache/parquet/hadoop/metadata/ColumnPath.java
+++ b/parquet-common/src/main/java/org/apache/parquet/hadoop/metadata/ColumnPath.java
@@ -22,6 +22,8 @@ import java.io.Serializable;
 import java.util.Arrays;
 import java.util.Iterator;
 
+import org.apache.parquet.Strings;
+
 import static org.apache.parquet.Preconditions.checkNotNull;
 
 public final class ColumnPath implements Iterable<String>, Serializable {
@@ -66,15 +68,7 @@ public final class ColumnPath implements Iterable<String>, Serializable {
   }
 
   public String toDotString() {
-    Iterator<String> iter = Arrays.asList(p).iterator();
-    StringBuilder sb = new StringBuilder();
-    while (iter.hasNext()) {
-      sb.append(iter.next());
-      if (iter.hasNext()) {
-        sb.append('.');
-      }
-    }
-    return sb.toString();
+    return Strings.join(p, ".");
   }
 
   @Override

http://git-wip-us.apache.org/repos/asf/parquet-mr/blob/7fc79983/parquet-common/src/test/java/org/apache/parquet/glob/TestGlob.java
----------------------------------------------------------------------
diff --git a/parquet-common/src/test/java/org/apache/parquet/glob/TestGlob.java b/parquet-common/src/test/java/org/apache/parquet/glob/TestGlob.java
new file mode 100644
index 0000000..d9902a3
--- /dev/null
+++ b/parquet-common/src/test/java/org/apache/parquet/glob/TestGlob.java
@@ -0,0 +1,144 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.parquet.glob;
+
+import java.util.Arrays;
+
+import org.apache.parquet.Strings;
+import org.apache.parquet.glob.GlobParser.GlobParseException;
+import org.junit.Test;
+
+import junit.framework.Assert;
+
+import static junit.framework.Assert.fail;
+import static org.junit.Assert.assertEquals;
+
+public class TestGlob {
+
+  @Test
+  public void testNoGlobs() {
+    assertEquals(Arrays.asList("foo"), Strings.expandGlob("foo"));
+  }
+
+  @Test
+  public void testEmptyGroup() {
+    assertEquals(Arrays.asList(""), Strings.expandGlob(""));
+    assertEquals(Arrays.asList(""), Strings.expandGlob("{}"));
+    assertEquals(Arrays.asList("a"), Strings.expandGlob("a{}"));
+    assertEquals(Arrays.asList("ab"), Strings.expandGlob("a{}b"));
+    assertEquals(Arrays.asList("a"), Strings.expandGlob("{}a"));
+    assertEquals(Arrays.asList("a"), Strings.expandGlob("a{}"));
+    assertEquals(Arrays.asList("", ""), Strings.expandGlob("{,}"));
+    assertEquals(Arrays.asList("ab", "a", "ac"), Strings.expandGlob("a{b,{},c}"));
+  }
+
+  @Test
+  public void testSingleLevel() {
+    assertEquals(Arrays.asList("foobar", "foobaz"), Strings.expandGlob("foo{bar,baz}"));
+    assertEquals(Arrays.asList("startfooend", "startbarend"), Strings.expandGlob("start{foo,bar}end"));
+    assertEquals(Arrays.asList("fooend", "barend"), Strings.expandGlob("{foo,bar}end"));
+    assertEquals(Arrays.asList(
+        "startfooenda", "startfooendb", "startfooendc", "startfooendd",
+        "startbarenda", "startbarendb", "startbarendc", "startbarendd"),
+        Strings.expandGlob("start{foo,bar}end{a,b,c,d}"));
+    assertEquals(Arrays.asList("xa", "xb", "xc", "ya", "yb", "yc"), Strings.expandGlob("{x,y}{a,b,c}"));
+    assertEquals(Arrays.asList("x", "y", "z"), Strings.expandGlob("{x,y,z}"));
+  }
+
+  @Test
+  public void testNested() {
+    assertEquals(Arrays.asList(
+            "startoneend", "startpretwopostend", "startprethreepostend",
+            "startfourend", "startfiveend", "a", "b", "foox", "fooy"),
+            Strings.expandGlob("{start{one,pre{two,three}post,{four,five}}end,a,b,foo{x,y}}"));
+  }
+
+  @Test
+  public void testExtraBraces() {
+    assertEquals(Arrays.asList("x", "y", "z"), Strings.expandGlob("{{x,y,z}}"));
+    assertEquals(Arrays.asList("x", "y", "z"), Strings.expandGlob("{{{x,y,z}}}"));
+    assertEquals(Arrays.asList("startx", "starta", "startb", "starty"), Strings.expandGlob("start{x,{a,b},y}"));
+  }
+
+  @Test
+  public void testCommaInTopLevel() {
+    try {
+      Strings.expandGlob("foo,bar");
+      fail("This should throw");
+    } catch (GlobParseException e) {
+      Assert.assertEquals("Unexpected comma outside of a {} group:\n" +
+          "foo,bar\n" +
+          "---^", e.getMessage());
+    }
+  }
+
+  @Test
+  public void testCommaCornerCases() {
+    // single empty string in each location
+    assertEquals(Arrays.asList("foobar", "foo", "foobaz"), Strings.expandGlob("foo{bar,,baz}"));
+    assertEquals(Arrays.asList("foo", "foobar", "foobaz"), Strings.expandGlob("foo{,bar,baz}"));
+    assertEquals(Arrays.asList("foobar", "foobaz", "foo"), Strings.expandGlob("foo{bar,baz,}"));
+
+    // multiple empty strings
+    assertEquals(Arrays.asList("foobar", "foo", "foo", "foobaz"), Strings.expandGlob("foo{bar,,,baz}"));
+    assertEquals(Arrays.asList("foo", "foo", "foobar", "foobaz"), Strings.expandGlob("foo{,,bar,baz}"));
+    assertEquals(Arrays.asList("foobar", "foobaz", "foo", "foo"), Strings.expandGlob("foo{bar,baz,,}"));
+
+    // between groups
+    assertEquals(Arrays.asList("x", "y", "", "a", "b"), Strings.expandGlob("{{x,y},,{a,b}}"));
+  }
+
+  private void assertNotEnoughCloseBraces(String s) {
+    String expected = "Not enough close braces in: ";
+    try {
+      Strings.expandGlob(s);
+      fail("this should throw");
+    } catch (GlobParseException e) {
+      Assert.assertEquals(expected, e.getMessage().substring(0, expected.length()));
+    }
+  }
+
+  private void assertTooManyCloseBraces(String s) {
+    String expected = "Unexpected closing }:";
+    try {
+      Strings.expandGlob(s);
+      fail("this should throw");
+    } catch (GlobParseException e) {
+      Assert.assertEquals(expected, e.getMessage().substring(0, expected.length()));
+    }
+  }
+
+
+  @Test
+  public void testMismatchedBraces() {
+    assertNotEnoughCloseBraces("{");
+    assertNotEnoughCloseBraces("{}{}{}{{}{}{");
+    assertNotEnoughCloseBraces("foo{bar");
+    assertNotEnoughCloseBraces("foo{{bar}");
+    assertNotEnoughCloseBraces("foo{}{{bar}");
+
+    assertTooManyCloseBraces("{}}{");
+    assertTooManyCloseBraces("}");
+    assertTooManyCloseBraces("{}{}{}}{}{}{");
+    assertTooManyCloseBraces("foo}bar");
+    assertTooManyCloseBraces("foo}}bar}");
+    assertTooManyCloseBraces("foo{}{{bar}}}");
+  }
+
+}

http://git-wip-us.apache.org/repos/asf/parquet-mr/blob/7fc79983/parquet-common/src/test/java/org/apache/parquet/glob/TestWildcardPath.java
----------------------------------------------------------------------
diff --git a/parquet-common/src/test/java/org/apache/parquet/glob/TestWildcardPath.java b/parquet-common/src/test/java/org/apache/parquet/glob/TestWildcardPath.java
new file mode 100644
index 0000000..d8af8b4
--- /dev/null
+++ b/parquet-common/src/test/java/org/apache/parquet/glob/TestWildcardPath.java
@@ -0,0 +1,125 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.parquet.glob;
+
+import org.junit.Test;
+
+import static org.junit.Assert.fail;
+
+
+public class TestWildcardPath {
+
+  private static void assertMatches(WildcardPath wp, String... strings) {
+    for (String s : strings) {
+      if (!wp.matches(s)) {
+        fail(String.format("String '%s' was expected to match '%s'", s, wp));
+      }
+    }
+  }
+
+  private static void assertDoesNotMatch(WildcardPath wp, String... strings) {
+    for (String s : strings) {
+      if (wp.matches(s)) {
+        fail(String.format("String '%s' was not expected to match '%s'", s, wp));
+      }
+    }
+  }
+
+
+  @Test
+  public void testNoWildcards() {
+    WildcardPath wp = new WildcardPath("", "foo", '.');
+    assertMatches(wp, "foo", "foo.x", "foo.x.y");
+    assertDoesNotMatch(wp, "xfoo", "xfoox", "fooa.x.y");
+  }
+
+  @Test
+  public void testStarMatchesEverything() {
+    WildcardPath wp = new WildcardPath("", "*", '.');
+    assertMatches(wp, "", ".", "hi", "foo.bar", "*", "foo.");
+  }
+
+  @Test
+  public void testChildrenPathsMatch() {
+    WildcardPath wp = new WildcardPath("", "x.y.z", '.');
+    assertMatches(wp, "x.y.z", "x.y.z.bar", "x.y.z.bar.baz.bop");
+    assertDoesNotMatch(wp, "x.y.zzzz", "x.y.b", "x.y.a.z", "x.y.zhi.z");
+  }
+
+  @Test
+  public void testEmptyString() {
+    WildcardPath wp = new WildcardPath("", "", '.');
+    assertMatches(wp, "");
+    assertDoesNotMatch(wp, "x");
+  }
+
+  @Test
+  public void testDoubleStarsIgnored() {
+    WildcardPath wp = new WildcardPath("", "foo**bar", '.');
+    assertMatches(wp, "foobar", "fooxyzbar", "foo.x.y.z.bar");
+    assertDoesNotMatch(wp, "fobar", "hi", "foobazr");
+
+    wp = new WildcardPath("", "foo********bar", '.');
+    assertMatches(wp, "foobar", "fooxyzbar", "foo.x.y.z.bar");
+    assertDoesNotMatch(wp, "fobar", "hi", "foobazr");
+  }
+
+  @Test
+  public void testStarsAtBeginAndEnd() {
+    WildcardPath wp = new WildcardPath("", "*x.y.z", '.');
+    assertMatches(wp, "a.b.c.x.y.z", "x.y.z", "zoopx.y.z", "zoopx.y.z.child");
+    assertDoesNotMatch(wp, "a.b.c.x.y", "xy.z", "hi");
+
+    wp = new WildcardPath("", "*.x.y.z", '.');
+    assertMatches(wp, "a.b.c.x.y.z", "foo.x.y.z", "foo.x.y.z.child");
+    assertDoesNotMatch(wp, "x.y.z", "a.b.c.x.y", "xy.z", "hi", "zoopx.y.z", "zoopx.y.z.child");
+
+
+    wp = new WildcardPath("", "x.y.z*", '.');
+    assertMatches(wp, "x.y.z", "x.y.z.foo", "x.y.zoo", "x.y.zoo.bar");
+    assertDoesNotMatch(wp, "a.b.c.x.y.z", "foo.x.y.z", "hi");
+
+    wp = new WildcardPath("", "x.y.z.*", '.');
+    assertMatches(wp, "x.y.z.foo", "x.y.z.bar.baz");
+    assertDoesNotMatch(wp, "x.y.z", "a.b.c.x.y.z", "x.y.zoo", "foo.x.y.z", "hi", "x.y.zoo.bar");
+  }
+
+  @Test
+  public void testComplex() {
+    WildcardPath wp = new WildcardPath("", "*.street", '.');
+    assertMatches(wp,
+        "home.address.street", "home.address.street.number", "work.address.street", "work.address.street.foo",
+        "street.street", "street.street.street.street", "thing.street.thing"
+    );
+
+    assertDoesNotMatch(wp,
+        "home.address.street_2", "home.address.street_2.number", "work.addressstreet", "work.addressstreet.foo", "",
+        "x.y.z.street2", "x.y.z.street2.z"
+    );
+
+    wp = new WildcardPath("", "x.y.*_stat.average", '.');
+    assertMatches(wp, "x.y.z_stat.average", "x.y.foo_stat.average", "x.y.z.a.b_stat.average",
+        "x.y.z.a.b_stat.average.child", "x.y.z._stat.average");
+    assertDoesNotMatch(wp, "x.y.z_stats.average", "x.y.z_stat.averages", "x.y_stat.average", "x.yyy.foo_stat.average");
+
+    wp = new WildcardPath("", "x.y.pre*.bar", '.');
+    assertMatches(wp, "x.y.pre.bar", "x.y.preabc.bar", "x.y.prebar.bar");
+    assertDoesNotMatch(wp, "x.y.pre.baraaaa", "x.y.preabc.baraaaa");
+  }
+}

http://git-wip-us.apache.org/repos/asf/parquet-mr/blob/7fc79983/parquet-thrift/src/main/java/org/apache/parquet/hadoop/thrift/ThriftReadSupport.java
----------------------------------------------------------------------
diff --git a/parquet-thrift/src/main/java/org/apache/parquet/hadoop/thrift/ThriftReadSupport.java b/parquet-thrift/src/main/java/org/apache/parquet/hadoop/thrift/ThriftReadSupport.java
index 871f817..cb9bf66 100644
--- a/parquet-thrift/src/main/java/org/apache/parquet/hadoop/thrift/ThriftReadSupport.java
+++ b/parquet-thrift/src/main/java/org/apache/parquet/hadoop/thrift/ThriftReadSupport.java
@@ -28,6 +28,7 @@ import org.apache.thrift.TBase;
 import org.apache.thrift.protocol.TProtocol;
 
 import org.apache.parquet.Log;
+import org.apache.parquet.Strings;
 import org.apache.parquet.hadoop.api.InitContext;
 import org.apache.parquet.hadoop.api.ReadSupport;
 import org.apache.parquet.io.ParquetDecodingException;
@@ -38,20 +39,29 @@ import org.apache.parquet.thrift.ThriftMetaData;
 import org.apache.parquet.thrift.ThriftRecordConverter;
 import org.apache.parquet.thrift.ThriftSchemaConverter;
 import org.apache.parquet.thrift.projection.FieldProjectionFilter;
+import org.apache.parquet.thrift.projection.StrictFieldProjectionFilter;
 import org.apache.parquet.thrift.projection.ThriftProjectionException;
+import org.apache.parquet.thrift.projection.deprecated.DeprecatedFieldProjectionFilter;
 import org.apache.parquet.thrift.struct.ThriftType.StructType;
 
 public class ThriftReadSupport<T> extends ReadSupport<T> {
   private static final Log LOG = Log.getLog(ThriftReadSupport.class);
 
   /**
-   * configuration key for thrift read projection schema
+   * Deprecated. Use {@link #STRICT_THRIFT_COLUMN_FILTER_KEY}
+   * Accepts a ";" delimited list of globs in the syntax implemented by {@link DeprecatedFieldProjectionFilter}
    */
-  public static final String THRIFT_COLUMN_FILTER_KEY = "parquet.thrift.column.filter";
+  @Deprecated
+  public static final String DEPRECATED_THRIFT_COLUMN_FILTER_KEY = "parquet.thrift.column.filter";
+
+  /**
+   * Accepts a ";" delimited list of glob paths, in the syntax implemented by {@link StrictFieldProjectionFilter}
+   */
+  public static final String STRICT_THRIFT_COLUMN_FILTER_KEY = "parquet.thrift.column.projection.globs";
+
   private static final String RECORD_CONVERTER_DEFAULT = TBaseRecordConverter.class.getName();
   public static final String THRIFT_READ_CLASS_KEY = "parquet.thrift.read.class";
 
-
   /**
    * A {@link ThriftRecordConverter} builds an object by working with {@link TProtocol}. The default
    * implementation creates standard Apache Thrift {@link TBase} objects; to support alternatives, such
@@ -87,6 +97,43 @@ public class ThriftReadSupport<T> extends ReadSupport<T> {
     conf.set(RECORD_CONVERTER_CLASS_KEY, klass.getName());
   }
 
+  @Deprecated
+  public static void setProjectionPushdown(JobConf jobConf, String projectionString) {
+    jobConf.set(DEPRECATED_THRIFT_COLUMN_FILTER_KEY, projectionString);
+  }
+
+  public static void setStrictFieldProjectionFilter(Configuration conf, String semicolonDelimitedGlobs) {
+    conf.set(STRICT_THRIFT_COLUMN_FILTER_KEY, semicolonDelimitedGlobs);
+  }
+
+  public static FieldProjectionFilter getFieldProjectionFilter(Configuration conf) {
+    String deprecated = conf.get(DEPRECATED_THRIFT_COLUMN_FILTER_KEY);
+    String strict = conf.get(STRICT_THRIFT_COLUMN_FILTER_KEY);
+
+    if (Strings.isNullOrEmpty(deprecated) && Strings.isNullOrEmpty(strict)) {
+      return null;
+    }
+
+    if(!Strings.isNullOrEmpty(deprecated) && !Strings.isNullOrEmpty(strict)) {
+      throw new ThriftProjectionException(
+          "You cannot provide both "
+              + DEPRECATED_THRIFT_COLUMN_FILTER_KEY
+              + " and "
+              + STRICT_THRIFT_COLUMN_FILTER_KEY
+              +"! "
+              + DEPRECATED_THRIFT_COLUMN_FILTER_KEY
+              + " is deprecated."
+      );
+    }
+
+    if (!Strings.isNullOrEmpty(deprecated)) {
+      LOG.warn(String.format("Using %s is deprecated. Please see the docs for %s!",
+          DEPRECATED_THRIFT_COLUMN_FILTER_KEY, STRICT_THRIFT_COLUMN_FILTER_KEY));
+      return new DeprecatedFieldProjectionFilter(deprecated);
+    }
+
+    return StrictFieldProjectionFilter.fromSemicolonDelimitedString(strict);
+  }
 
   /**
    * used from hadoop
@@ -102,29 +149,31 @@ public class ThriftReadSupport<T> extends ReadSupport<T> {
     this.thriftClass = thriftClass;
   }
 
-
-
   @Override
   public org.apache.parquet.hadoop.api.ReadSupport.ReadContext init(InitContext context) {
     final Configuration configuration = context.getConfiguration();
     final MessageType fileMessageType = context.getFileSchema();
     MessageType requestedProjection = fileMessageType;
     String partialSchemaString = configuration.get(ReadSupport.PARQUET_READ_SCHEMA);
-    String projectionFilterString = configuration.get(THRIFT_COLUMN_FILTER_KEY);
 
-    if (partialSchemaString != null && projectionFilterString != null)
-      throw new ThriftProjectionException("PARQUET_READ_SCHEMA and THRIFT_COLUMN_FILTER_KEY are both specified, should use only one.");
+    FieldProjectionFilter projectionFilter = getFieldProjectionFilter(configuration);
+
+    if (partialSchemaString != null && projectionFilter != null) {
+      throw new ThriftProjectionException(
+          String.format("You cannot provide both a partial schema and field projection filter."
+                  + "Only one of (%s, %s, %s) should be set.",
+              PARQUET_READ_SCHEMA, STRICT_THRIFT_COLUMN_FILTER_KEY, DEPRECATED_THRIFT_COLUMN_FILTER_KEY));
+    }
 
     //set requestedProjections only when it's specified
     if (partialSchemaString != null) {
       requestedProjection = getSchemaForRead(fileMessageType, partialSchemaString);
-    } else if (projectionFilterString != null && !projectionFilterString.isEmpty()) {
-      FieldProjectionFilter fieldProjectionFilter = new FieldProjectionFilter(projectionFilterString);
+    } else if (projectionFilter != null) {
       try {
         initThriftClassFromMultipleFiles(context.getKeyValueMetadata(), configuration);
-        requestedProjection =  getProjectedSchema(fieldProjectionFilter);
+        requestedProjection =  getProjectedSchema(projectionFilter);
       } catch (ClassNotFoundException e) {
-        throw new ThriftProjectionException("can not find thriftClass from configuration");
+        throw new ThriftProjectionException("can not find thriftClass from configuration", e);
       }
     }
 
@@ -132,6 +181,7 @@ public class ThriftReadSupport<T> extends ReadSupport<T> {
     return new ReadContext(schemaForRead);
   }
 
+  @SuppressWarnings("unchecked")
   protected MessageType getProjectedSchema(FieldProjectionFilter fieldProjectionFilter) {
     return new ThriftSchemaConverter(fieldProjectionFilter).convert((Class<TBase<?, ?>>)thriftClass);
   }
@@ -188,8 +238,4 @@ public class ThriftReadSupport<T> extends ReadSupport<T> {
       throw new RuntimeException("Unable to create Thrift Converter for Thrift metadata " + thriftMetaData, t);
     }
   }
-
-  public static void setProjectionPushdown(JobConf jobConf, String projectionString) {
-    jobConf.set(THRIFT_COLUMN_FILTER_KEY, projectionString);
-  }
 }

http://git-wip-us.apache.org/repos/asf/parquet-mr/blob/7fc79983/parquet-thrift/src/main/java/org/apache/parquet/thrift/ThriftSchemaConvertVisitor.java
----------------------------------------------------------------------
diff --git a/parquet-thrift/src/main/java/org/apache/parquet/thrift/ThriftSchemaConvertVisitor.java b/parquet-thrift/src/main/java/org/apache/parquet/thrift/ThriftSchemaConvertVisitor.java
index ec5a371..3081d87 100644
--- a/parquet-thrift/src/main/java/org/apache/parquet/thrift/ThriftSchemaConvertVisitor.java
+++ b/parquet-thrift/src/main/java/org/apache/parquet/thrift/ThriftSchemaConvertVisitor.java
@@ -35,6 +35,7 @@ import static org.apache.parquet.schema.Types.primitive;
 import java.util.ArrayList;
 import java.util.List;
 
+import org.apache.parquet.Preconditions;
 import org.apache.parquet.schema.GroupType;
 import org.apache.parquet.schema.MessageType;
 import org.apache.parquet.schema.OriginalType;
@@ -49,25 +50,21 @@ import org.apache.parquet.thrift.struct.ThriftField;
 import org.apache.parquet.thrift.struct.ThriftType;
 
 /**
- * Visitor Class for converting a thrift definiton to parquet message type.
+ * Visitor Class for converting a thrift definition to parquet message type.
  * Projection can be done by providing a {@link FieldProjectionFilter}
  *
  * @author Tianshuo Deng
  */
 public class ThriftSchemaConvertVisitor implements ThriftType.TypeVisitor {
+  private final FieldProjectionFilter fieldProjectionFilter;
+  private final FieldsPath currentFieldPath = new FieldsPath();
 
-  public FieldProjectionFilter getFieldProjectionFilter() {
-    return fieldProjectionFilter;
-  }
-
-  FieldProjectionFilter fieldProjectionFilter;
-  Type currentType;
-  FieldsPath currentFieldPath = new FieldsPath();
-  Type.Repetition currentRepetition = Type.Repetition.REPEATED;//MessageType is repeated GroupType
-  String currentName = "ParquetSchema";
+  private Type currentType;
+  private Type.Repetition currentRepetition = Type.Repetition.REPEATED; // MessageType is repeated GroupType
+  private String currentName = "ParquetSchema";
 
   public ThriftSchemaConvertVisitor(FieldProjectionFilter fieldProjectionFilter) {
-    this.fieldProjectionFilter = fieldProjectionFilter;
+    this.fieldProjectionFilter = Preconditions.checkNotNull(fieldProjectionFilter, "fieldProjectionFilter");
   }
 
   @Override
@@ -120,9 +117,7 @@ public class ThriftSchemaConvertVisitor implements ThriftType.TypeVisitor {
     currentRepetition = REPEATED;
     setElemField.getType().accept(this);
     //after conversion, currentType is the nested type
-    if (currentType == null) {
-      return;
-    } else {
+    if (currentType != null) {
       currentType = listType(setRepetition, setName, currentType);
     }
   }
@@ -136,12 +131,9 @@ public class ThriftSchemaConvertVisitor implements ThriftType.TypeVisitor {
     currentRepetition = REPEATED;
     setElemField.getType().accept(this);
     //after conversion, currentType is the nested type
-    if (currentType == null) {
-      return;
-    } else {
+    if (currentType != null) {
       currentType = listType(listRepetition, listName, currentType);
     }
-
   }
 
   public MessageType getConvertedMessageType() {
@@ -173,10 +165,8 @@ public class ThriftSchemaConvertVisitor implements ThriftType.TypeVisitor {
 
   private List<Type> getFieldsTypes(List<ThriftField> fields) {
     List<Type> types = new ArrayList<Type>();
-    for (int i = 0; i < fields.size(); i++) {
-      ThriftField field = fields.get(i);
-      Type.Repetition rep = getRepetition(field);
-      currentRepetition = rep;
+    for (ThriftField field : fields) {
+      currentRepetition = getRepetition(field);
       currentName = field.getName();
       currentFieldPath.push(field);
       field.getType().accept(this);
@@ -190,7 +180,7 @@ public class ThriftSchemaConvertVisitor implements ThriftType.TypeVisitor {
   }
 
   private boolean isCurrentlyMatchedFilter(){
-     if(!fieldProjectionFilter.isMatched(currentFieldPath)){
+     if(!fieldProjectionFilter.keep(currentFieldPath)){
        currentType = null;
        return false;
      }

http://git-wip-us.apache.org/repos/asf/parquet-mr/blob/7fc79983/parquet-thrift/src/main/java/org/apache/parquet/thrift/ThriftSchemaConverter.java
----------------------------------------------------------------------
diff --git a/parquet-thrift/src/main/java/org/apache/parquet/thrift/ThriftSchemaConverter.java b/parquet-thrift/src/main/java/org/apache/parquet/thrift/ThriftSchemaConverter.java
index c580b54..f9f23eb 100644
--- a/parquet-thrift/src/main/java/org/apache/parquet/thrift/ThriftSchemaConverter.java
+++ b/parquet-thrift/src/main/java/org/apache/parquet/thrift/ThriftSchemaConverter.java
@@ -26,8 +26,6 @@ import org.apache.thrift.TUnion;
 
 import org.apache.parquet.schema.MessageType;
 import org.apache.parquet.thrift.projection.FieldProjectionFilter;
-import org.apache.parquet.thrift.projection.PathGlobPattern;
-import org.apache.parquet.thrift.projection.ThriftProjectionException;
 import org.apache.parquet.thrift.struct.ThriftField;
 import org.apache.parquet.thrift.struct.ThriftField.Requirement;
 import org.apache.parquet.thrift.struct.ThriftType;
@@ -44,15 +42,10 @@ import java.util.List;
  * a {@link FieldProjectionFilter} can be specified for projection pushdown.
  */
 public class ThriftSchemaConverter {
-
   private final FieldProjectionFilter fieldProjectionFilter;
 
-  public static <T extends TBase<?,?>> StructOrUnionType structOrUnionType(Class<T> klass) {
-    return TUnion.class.isAssignableFrom(klass) ? StructOrUnionType.UNION : StructOrUnionType.STRUCT;
-  }
-
   public ThriftSchemaConverter() {
-    this(new FieldProjectionFilter());
+    this(FieldProjectionFilter.ALL_COLUMNS);
   }
 
   public ThriftSchemaConverter(FieldProjectionFilter fieldProjectionFilter) {
@@ -60,52 +53,43 @@ public class ThriftSchemaConverter {
   }
 
   public MessageType convert(Class<? extends TBase<?, ?>> thriftClass) {
-    return convert(new ThriftStructConverter().toStructType(thriftClass));
+    return convert(toStructType(thriftClass));
   }
 
   public MessageType convert(StructType thriftClass) {
     ThriftSchemaConvertVisitor visitor = new ThriftSchemaConvertVisitor(fieldProjectionFilter);
     thriftClass.accept(visitor);
     MessageType convertedMessageType = visitor.getConvertedMessageType();
-    checkUnmatchedProjectionFilter(visitor.getFieldProjectionFilter());
+    fieldProjectionFilter.assertNoUnmatchedPatterns();
     return convertedMessageType;
   }
 
-  private void checkUnmatchedProjectionFilter(FieldProjectionFilter filter) {
-    List<PathGlobPattern> unmatched = filter.getUnMatchedPatterns();
-    if (unmatched.size() != 0) {
-      throw new ThriftProjectionException("unmatched projection filters: " + unmatched.toString());
-    }
+  public static <T extends TBase<?,?>> StructOrUnionType structOrUnionType(Class<T> klass) {
+    return TUnion.class.isAssignableFrom(klass) ? StructOrUnionType.UNION : StructOrUnionType.STRUCT;
   }
 
-  public ThriftType.StructType toStructType(Class<? extends TBase<?, ?>> thriftClass) {
-    return new ThriftStructConverter().toStructType(thriftClass);
+  public static ThriftType.StructType toStructType(Class<? extends TBase<?, ?>> thriftClass) {
+    final TStructDescriptor struct = TStructDescriptor.getInstance(thriftClass);
+    return toStructType(struct);
   }
 
-  private static class ThriftStructConverter {
-
-    public ThriftType.StructType toStructType(Class<? extends TBase<?, ?>> thriftClass) {
-      final TStructDescriptor struct = TStructDescriptor.getInstance(thriftClass);
-      return toStructType(struct);
-    }
-
-    private StructType toStructType(TStructDescriptor struct) {
-      List<Field> fields = struct.getFields();
-      List<ThriftField> children = new ArrayList<ThriftField>(fields.size());
-      for (int i = 0; i < fields.size(); i++) {
-        Field field = fields.get(i);
-        Requirement req =
-                field.getFieldMetaData() == null ?
-                        Requirement.OPTIONAL :
-                        Requirement.fromType(field.getFieldMetaData().requirementType);
-        children.add(toThriftField(field.getName(), field, req));
-      }
-      return new StructType(children, structOrUnionType(struct.getThriftClass()));
+  private static StructType toStructType(TStructDescriptor struct) {
+    List<Field> fields = struct.getFields();
+    List<ThriftField> children = new ArrayList<ThriftField>(fields.size());
+    for (int i = 0; i < fields.size(); i++) {
+      Field field = fields.get(i);
+      Requirement req =
+          field.getFieldMetaData() == null ?
+              Requirement.OPTIONAL :
+              Requirement.fromType(field.getFieldMetaData().requirementType);
+      children.add(toThriftField(field.getName(), field, req));
     }
+    return new StructType(children, structOrUnionType(struct.getThriftClass()));
+  }
 
-    private ThriftField toThriftField(String name, Field field, ThriftField.Requirement requirement) {
-      ThriftType type;
-      switch (ThriftTypeID.fromByte(field.getType())) {
+  private static ThriftField toThriftField(String name, Field field, ThriftField.Requirement requirement) {
+    ThriftType type;
+    switch (ThriftTypeID.fromByte(field.getType())) {
       case STOP:
       case VOID:
       default:
@@ -138,8 +122,8 @@ public class ThriftSchemaConverter {
         final Field mapKeyField = field.getMapKeyField();
         final Field mapValueField = field.getMapValueField();
         type = new ThriftType.MapType(
-                toThriftField(mapKeyField.getName(), mapKeyField, requirement),
-                toThriftField(mapValueField.getName(), mapValueField, requirement));
+            toThriftField(mapKeyField.getName(), mapKeyField, requirement),
+            toThriftField(mapValueField.getName(), mapValueField, requirement));
         break;
       case SET:
         final Field setElemField = field.getSetElemField();
@@ -157,9 +141,8 @@ public class ThriftSchemaConverter {
         }
         type = new EnumType(values);
         break;
-      }
-      return new ThriftField(name, field.getId(), requirement, type);
     }
+    return new ThriftField(name, field.getId(), requirement, type);
   }
 }
 

http://git-wip-us.apache.org/repos/asf/parquet-mr/blob/7fc79983/parquet-thrift/src/main/java/org/apache/parquet/thrift/projection/FieldProjectionFilter.java
----------------------------------------------------------------------
diff --git a/parquet-thrift/src/main/java/org/apache/parquet/thrift/projection/FieldProjectionFilter.java b/parquet-thrift/src/main/java/org/apache/parquet/thrift/projection/FieldProjectionFilter.java
index e038406..5498c8e 100644
--- a/parquet-thrift/src/main/java/org/apache/parquet/thrift/projection/FieldProjectionFilter.java
+++ b/parquet-thrift/src/main/java/org/apache/parquet/thrift/projection/FieldProjectionFilter.java
@@ -1,4 +1,4 @@
-/* 
+/*
  * Licensed to the Apache Software Foundation (ASF) under one
  * or more contributor license agreements.  See the NOTICE file
  * distributed with this work for additional information
@@ -6,9 +6,9 @@
  * to you under the Apache License, Version 2.0 (the
  * "License"); you may not use this file except in compliance
  * with the License.  You may obtain a copy of the License at
- * 
+ *
  *   http://www.apache.org/licenses/LICENSE-2.0
- * 
+ *
  * Unless required by applicable law or agreed to in writing,
  * software distributed under the License is distributed on an
  * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
@@ -18,76 +18,44 @@
  */
 package org.apache.parquet.thrift.projection;
 
-import java.util.LinkedList;
-import java.util.List;
 
 /**
- * Filter thrift attributes using glob syntax.
+ * A field projection filter decides whether a thrift field (column) should
+ * be included when reading thrift data. It is used to implement projection push down.
  *
- * @author Tianshuo Deng
+ * See {@link StrictFieldProjectionFilter} and
+ * {@link parquet.thrift.projection.deprecated.DeprecatedFieldProjectionFilter}
  */
-public class FieldProjectionFilter {
-  public static final String PATTERN_SEPARATOR = ";";
-  List<PathGlobPatternStatus> filterPatterns;
+public interface FieldProjectionFilter {
 
   /**
-   * Class for remembering if a glob pattern has matched anything.
-   * If there is an invalid glob pattern that matches nothing, it should throw.
+   * Decide whether to keep the field (column) represented by path.
+   *
+   * @param path the path to the field (column)
+   * @return true to keep, false to discard (project out)
    */
-  private static class PathGlobPatternStatus {
-    PathGlobPattern pattern;
-    boolean hasMatchingPath = false;
-
-    PathGlobPatternStatus(String pattern) {
-      this.pattern = new PathGlobPattern(pattern);
-    }
-
-    public boolean matches(FieldsPath path) {
-      if (this.pattern.matches(path.toString())) {
-        this.hasMatchingPath = true;
-        return true;
-      } else {
-        return false;
-      }
-    }
-  }
-
-  public FieldProjectionFilter() {
-    filterPatterns = new LinkedList<PathGlobPatternStatus>();
-  }
-
-  public FieldProjectionFilter(String filterDescStr) {
-    filterPatterns = new LinkedList<PathGlobPatternStatus>();
-
-    if (filterDescStr == null || filterDescStr.isEmpty())
-      return;
+  boolean keep(FieldsPath path);
 
-    String[] rawPatterns = filterDescStr.split(PATTERN_SEPARATOR);
-    for (String rawPattern : rawPatterns) {
-      filterPatterns.add(new PathGlobPatternStatus(rawPattern));
-    }
-  }
+  /**
+   * Should throw a ThriftProjectionException if this FieldProjectionFilter has remaining patterns / columns
+   * that didn't match any of paths passed to {@link #keep(FieldsPath)}.
+   *
+   * Will be called once after all paths have been passed to {@link #keep(FieldsPath)}.
+   */
+  void assertNoUnmatchedPatterns() throws ThriftProjectionException;
 
-  public boolean isMatched(FieldsPath path) {
-    if (filterPatterns.size() == 0)
+  /**
+   * A filter that keeps all of the columns.
+   */
+  public static final FieldProjectionFilter ALL_COLUMNS = new FieldProjectionFilter() {
+    @Override
+    public boolean keep(FieldsPath path) {
       return true;
-
-    for (int i = 0; i < filterPatterns.size(); i++) {
-
-      if (filterPatterns.get(i).matches(path))
-        return true;
     }
-    return false;
-  }
 
-  public List<PathGlobPattern> getUnMatchedPatterns() {
-    List<PathGlobPattern> unmatched = new LinkedList<PathGlobPattern>();
-    for (PathGlobPatternStatus p : filterPatterns) {
-      if (!p.hasMatchingPath) {
-        unmatched.add(p.pattern);
-      }
-    }
-    return unmatched;
-  }
+    @Override
+    public void assertNoUnmatchedPatterns() throws ThriftProjectionException {
 
+    }
+  };
 }

http://git-wip-us.apache.org/repos/asf/parquet-mr/blob/7fc79983/parquet-thrift/src/main/java/org/apache/parquet/thrift/projection/FieldsPath.java
----------------------------------------------------------------------
diff --git a/parquet-thrift/src/main/java/org/apache/parquet/thrift/projection/FieldsPath.java b/parquet-thrift/src/main/java/org/apache/parquet/thrift/projection/FieldsPath.java
index ac20b0d..aa5ebaf 100644
--- a/parquet-thrift/src/main/java/org/apache/parquet/thrift/projection/FieldsPath.java
+++ b/parquet-thrift/src/main/java/org/apache/parquet/thrift/projection/FieldsPath.java
@@ -1,4 +1,4 @@
-/* 
+/*
  * Licensed to the Apache Software Foundation (ASF) under one
  * or more contributor license agreements.  See the NOTICE file
  * distributed with this work for additional information
@@ -6,9 +6,9 @@
  * to you under the Apache License, Version 2.0 (the
  * "License"); you may not use this file except in compliance
  * with the License.  You may obtain a copy of the License at
- * 
+ *
  *   http://www.apache.org/licenses/LICENSE-2.0
- * 
+ *
  * Unless required by applicable law or agreed to in writing,
  * software distributed under the License is distributed on an
  * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
@@ -24,61 +24,64 @@ import org.apache.parquet.thrift.struct.ThriftField;
 import org.apache.parquet.thrift.struct.ThriftType;
 
 /**
- * represent field path for thrift field
+ * Represents a column path as a sequence of fields.
  *
  * @author Tianshuo Deng
  */
 public class FieldsPath {
-  ArrayList<ThriftField> fields = new ArrayList<ThriftField>();
+  private final ArrayList<ThriftField> fields = new ArrayList<ThriftField>();
 
-  public void push(ThriftField field) {
-    this.fields.add(field);
+  public void push(ThriftField f) {
+    this.fields.add(f);
   }
 
   public ThriftField pop() {
     return this.fields.remove(fields.size() - 1);
   }
 
-  @Override
-  public String toString() {
-    StringBuffer pathStrBuffer = new StringBuffer();
+  public ArrayList<ThriftField> getFields() {
+    return fields;
+  }
+
+  public String toDelimitedString(String delim) {
+    StringBuilder delimited = new StringBuilder();
     for (int i = 0; i < fields.size(); i++) {
       ThriftField currentField = fields.get(i);
       if (i > 0) {
         ThriftField previousField = fields.get(i - 1);
-        if (isKeyFieldOfMap(currentField, previousField)) {
-          pathStrBuffer.append("key/");
+        if (FieldsPath.isKeyFieldOfMap(currentField, previousField)) {
+          delimited.append("key");
+          delimited.append(delim);
           continue;
-        } else if (isValueFieldOfMap(currentField, previousField)) {
-          pathStrBuffer.append("value/");
+        } else if (FieldsPath.isValueFieldOfMap(currentField, previousField)) {
+          delimited.append("value");
+          delimited.append(delim);
           continue;
         }
       }
-
-      pathStrBuffer.append(currentField.getName()).append("/");
+      delimited.append(currentField.getName()).append(delim);
     }
 
-    if (pathStrBuffer.length() == 0) {
+    if (delimited.length() == 0) {
       return "";
     } else {
-      String pathStr = pathStrBuffer.substring(0, pathStrBuffer.length() - 1);
-      return pathStr;
+      return delimited.substring(0, delimited.length() - 1);
     }
   }
 
-  private boolean isValueFieldOfMap(ThriftField currentField, ThriftField previousField) {
+  @Override
+  public String toString() {
+    return toDelimitedString(".");
+  }
+
+  private static boolean isValueFieldOfMap(ThriftField currentField, ThriftField previousField) {
     ThriftType previousType = previousField.getType();
-    if(!(previousType instanceof ThriftType.MapType)) {
-      return false;
-    }
-    return ((ThriftType.MapType)previousType).getValue()==currentField;
+    return previousType instanceof ThriftType.MapType && ((ThriftType.MapType) previousType).getValue() == currentField;
   }
 
-  private boolean isKeyFieldOfMap(ThriftField currentField, ThriftField previousField) {
+  private static boolean isKeyFieldOfMap(ThriftField currentField, ThriftField previousField) {
     ThriftType previousType = previousField.getType();
-    if(!(previousType instanceof ThriftType.MapType)) {
-      return false;
-    }
-    return ((ThriftType.MapType)previousType).getKey()==currentField;
+    return previousType instanceof ThriftType.MapType && ((ThriftType.MapType) previousType).getKey() == currentField;
   }
+
 }

http://git-wip-us.apache.org/repos/asf/parquet-mr/blob/7fc79983/parquet-thrift/src/main/java/org/apache/parquet/thrift/projection/PathGlobPattern.java
----------------------------------------------------------------------
diff --git a/parquet-thrift/src/main/java/org/apache/parquet/thrift/projection/PathGlobPattern.java b/parquet-thrift/src/main/java/org/apache/parquet/thrift/projection/PathGlobPattern.java
deleted file mode 100644
index 0d35b82..0000000
--- a/parquet-thrift/src/main/java/org/apache/parquet/thrift/projection/PathGlobPattern.java
+++ /dev/null
@@ -1,184 +0,0 @@
-/* 
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements.  See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership.  The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License.  You may obtain a copy of the License at
- * 
- *   http://www.apache.org/licenses/LICENSE-2.0
- * 
- * Unless required by applicable law or agreed to in writing,
- * software distributed under the License is distributed on an
- * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
- * KIND, either express or implied.  See the License for the
- * specific language governing permissions and limitations
- * under the License.
- */
-package org.apache.parquet.thrift.projection;
-
-import org.apache.hadoop.fs.GlobPattern;
-
-import java.util.regex.Pattern;
-import java.util.regex.PatternSyntaxException;
-
-/**
- * Enhanced version of GlobPattern class that is defined in hadoop with extra capability of matching
- * full path separated by '/', and double star matching
- * @author Tianshuo Deng
- */
-
-public class PathGlobPattern {
-  private static final char BACKSLASH = '\\';
-  private static final char PATH_SEPARATOR = '/';
-  private Pattern compiled;
-  private boolean hasWildcard = false;
-
-  /**
-   * Construct the glob pattern object with a glob pattern string
-   *
-   * @param globPattern the glob pattern string
-   */
-  public PathGlobPattern(String globPattern) {
-    set(globPattern);
-  }
-
-  /**
-   * Compile glob pattern string
-   *
-   * @param globPattern the glob pattern
-   * @return the pattern object
-   */
-  public static Pattern compile(String globPattern) {
-    return new GlobPattern(globPattern).compiled();
-  }
-
-  private static void error(String message, String pattern, int pos) {
-    throw new PatternSyntaxException(message, pattern, pos);
-  }
-
-  /**
-   * @return the compiled pattern
-   */
-  public Pattern compiled() {
-    return compiled;
-  }
-
-  /**
-   * Match input against the compiled glob pattern
-   *
-   * @param s input chars
-   * @return true for successful matches
-   */
-  public boolean matches(CharSequence s) {
-    return compiled.matcher(s).matches();
-  }
-
-  /**
-   * Set and compile a glob pattern
-   *
-   * @param glob the glob pattern string
-   */
-  public void set(String glob) {
-    StringBuilder regex = new StringBuilder();
-    int setOpen = 0;
-    int curlyOpen = 0;
-    int len = glob.length();
-    hasWildcard = false;
-
-    for (int i = 0; i < len; i++) {
-      char c = glob.charAt(i);
-
-      switch (c) {
-        case BACKSLASH:
-          if (++i >= len) {
-            error("Missing escaped character", glob, i);
-          }
-          regex.append(c).append(glob.charAt(i));
-          continue;
-        case '.':
-        case '$':
-        case '(':
-        case ')':
-        case '|':
-        case '+':
-          // escape regex special chars that are not glob special chars
-          regex.append(BACKSLASH);
-          break;
-        case '*':
-          if (i + 1 < len && glob.charAt(i + 1) == '*') {
-            regex.append('.');
-            i++;
-            break;
-          }
-          regex.append("[^" + PATH_SEPARATOR + "]");
-          hasWildcard = true;
-          break;
-        case '?':
-          regex.append('.');
-          hasWildcard = true;
-          continue;
-        case '{': // start of a group
-          regex.append("(?:"); // non-capturing
-          curlyOpen++;
-          hasWildcard = true;
-          continue;
-        case ',':
-          regex.append(curlyOpen > 0 ? '|' : c);
-          continue;
-        case '}':
-          if (curlyOpen > 0) {
-            // end of a group
-            curlyOpen--;
-            regex.append(")");
-            continue;
-          }
-          break;
-        case '[':
-          if (setOpen > 0) {
-            error("Unclosed character class", glob, i);
-          }
-          setOpen++;
-          hasWildcard = true;
-          break;
-        case '^': // ^ inside [...] can be unescaped
-          if (setOpen == 0) {
-            regex.append(BACKSLASH);
-          }
-          break;
-        case '!': // [! needs to be translated to [^
-          regex.append(setOpen > 0 && '[' == glob.charAt(i - 1) ? '^' : '!');
-          continue;
-        case ']':
-          // Many set errors like [][] could not be easily detected here,
-          // as []], []-] and [-] are all valid POSIX glob and java regex.
-          // We'll just let the regex compiler do the real work.
-          setOpen = 0;
-          break;
-        default:
-      }
-      regex.append(c);
-    }
-
-    if (setOpen > 0) {
-      error("Unclosed character class", glob, len);
-    }
-    if (curlyOpen > 0) {
-      error("Unclosed group", glob, len);
-    }
-    compiled = Pattern.compile(regex.toString());
-  }
-
-  @Override
-  public String toString() {
-    return compiled.toString();
-  }
-
-  /**
-   * @return true if this is a wildcard pattern (with special chars)
-   */
-  public boolean hasWildcard() {
-    return hasWildcard;
-  }
-}


Mime
View raw message