hudi-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] [incubator-hudi] nsivabalan commented on a change in pull request #1402: [HUDI-407] Adding Simple Index
Date Fri, 13 Mar 2020 00:33:27 GMT
nsivabalan commented on a change in pull request #1402: [HUDI-407] Adding Simple Index
URL: https://github.com/apache/incubator-hudi/pull/1402#discussion_r391973225
 
 

 ##########
 File path: hudi-common/src/main/java/org/apache/hudi/common/util/ParquetUtils.java
 ##########
 @@ -103,6 +120,42 @@
     return rowKeys;
   }
 
+  /**
+   * Read the rows with record key and partition path from the given parquet file
+   *
+   * @param filePath      The parquet file path.
+   * @param configuration configuration to build fs object
+   * @return Set Set of row keys matching candidateRecordKeys
+   */
+  public static List<Pair<Pair<String, String>, Option<HoodieRecordLocation>>>
fetchRecordKeyPartitionPathFromParquet(Configuration configuration, Path filePath,
+                                                                                        
                             String baseInstantTime,
+                                                                                        
                             String fileId) {
+    List<Pair<Pair<String, String>, Option<HoodieRecordLocation>>>
rows = new ArrayList<>();
+    try {
+      if (!filePath.getFileSystem(configuration).exists(filePath)) {
+        return new ArrayList<>();
+      }
+      Configuration conf = new Configuration(configuration);
+      conf.addResource(FSUtils.getFs(filePath.toString(), conf).getConf());
+      Schema readSchema = HoodieAvroUtils.getRecordKeyPartitionPathSchema();
+      AvroReadSupport.setAvroReadSchema(conf, readSchema);
+      AvroReadSupport.setRequestedProjection(conf, readSchema);
+      ParquetReader reader = AvroParquetReader.builder(filePath).withConf(conf).build();
 
 Review comment:
   As of now, I don't do the lazy record iterator. Thought will make that in a diff patch
as I want to get this out sooner.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

Mime
View raw message