parquet-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jul...@apache.org
Subject git commit: PARQUET-92: Pig parallel control
Date Mon, 22 Sep 2014 18:21:38 GMT
Repository: incubator-parquet-mr
Updated Branches:
  refs/heads/master 9cdcf3bbd -> 3dc223cc8


PARQUET-92: Pig parallel control

The parallelism for reading footers was fixed at '5', which isn't optimal for using pig with
S3.  Just adding a property to adjust the parallelism.

JIRA: https://issues.apache.org/jira/browse/PARQUET-92

Author: Daniel Weeks <dweeks@netflix.com>

Closes #57 from dcw-netflix/pig-parallel-control and squashes the following commits:

e49087c [Daniel Weeks] Update ParquetFileReader.java
ec4f8ca [Daniel Weeks] Added configurable control of parallelism
d37a6de [Daniel Weeks] Resetting pom to main
0c1572e [Daniel Weeks] Merge remote-tracking branch 'upstream/master'
98c6607 [Daniel Weeks] Merge remote-tracking branch 'upstream/master'
96ba602 [Daniel Weeks] Disabled projects that don't compile


Project: http://git-wip-us.apache.org/repos/asf/incubator-parquet-mr/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-parquet-mr/commit/3dc223cc
Tree: http://git-wip-us.apache.org/repos/asf/incubator-parquet-mr/tree/3dc223cc
Diff: http://git-wip-us.apache.org/repos/asf/incubator-parquet-mr/diff/3dc223cc

Branch: refs/heads/master
Commit: 3dc223cc85022e11dc6cd954784e715e3a49fe5c
Parents: 9cdcf3b
Author: Daniel Weeks <dweeks@netflix.com>
Authored: Mon Sep 22 11:21:20 2014 -0700
Committer: julien <julien@twitter.com>
Committed: Mon Sep 22 11:21:20 2014 -0700

----------------------------------------------------------------------
 .../src/main/java/parquet/hadoop/ParquetFileReader.java       | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-parquet-mr/blob/3dc223cc/parquet-hadoop/src/main/java/parquet/hadoop/ParquetFileReader.java
----------------------------------------------------------------------
diff --git a/parquet-hadoop/src/main/java/parquet/hadoop/ParquetFileReader.java b/parquet-hadoop/src/main/java/parquet/hadoop/ParquetFileReader.java
index 49f1fab..74d65fe 100644
--- a/parquet-hadoop/src/main/java/parquet/hadoop/ParquetFileReader.java
+++ b/parquet-hadoop/src/main/java/parquet/hadoop/ParquetFileReader.java
@@ -79,6 +79,8 @@ public class ParquetFileReader implements Closeable {
 
   private static final Log LOG = Log.getLog(ParquetFileReader.class);
 
+  public static String PARQUET_READ_PARALLELISM = "parquet.metadata.read.parallelism";
+
   private static ParquetMetadataConverter parquetMetadataConverter = new ParquetMetadataConverter();
 
   /**
@@ -151,7 +153,7 @@ public class ParquetFileReader implements Closeable {
 
     Map<Path, Footer> cache = new HashMap<Path, Footer>();
     try {
-      List<Map<Path, Footer>> footersFromSummaries = runAllInParallel(5, summaries);
+      List<Map<Path, Footer>> footersFromSummaries = runAllInParallel(configuration.getInt(PARQUET_READ_PARALLELISM,
5), summaries);
       for (Map<Path, Footer> footers : footersFromSummaries) {
         cache.putAll(footers);
       }
@@ -181,6 +183,7 @@ public class ParquetFileReader implements Closeable {
   }
 
   private static <T> List<T> runAllInParallel(int parallelism, List<Callable<T>>
toRun) throws ExecutionException {
+    LOG.info("Initiating action with parallelism: " + parallelism);
     ExecutorService threadPool = Executors.newFixedThreadPool(parallelism);
     try {
       List<Future<T>> futures = new ArrayList<Future<T>>();
@@ -230,7 +233,7 @@ public class ParquetFileReader implements Closeable {
       });
     }
     try {
-      return runAllInParallel(5, footers);
+      return runAllInParallel(configuration.getInt(PARQUET_READ_PARALLELISM, 5), footers);
     } catch (ExecutionException e) {
       throw new IOException("Could not read footer: " + e.getMessage(), e.getCause());
     }


Mime
View raw message