hudi-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <>
Subject [GitHub] [incubator-hudi] n3nash commented on a change in pull request #1320: [HUDI-571] Add min/max headers on archived files
Date Wed, 12 Feb 2020 18:17:45 GMT
n3nash commented on a change in pull request #1320: [HUDI-571] Add min/max headers on archived

 File path: hudi-client/src/main/java/org/apache/hudi/io/
 @@ -268,6 +270,19 @@ public Path getArchiveFilePath() {
     return archiveFilePath;
+  private void writeHeaderBlock(Schema wrapperSchema, List<HoodieInstant> instants)
throws Exception {
+    if (!instants.isEmpty()) {
+      Collections.sort(instants, HoodieInstant.COMPARATOR);
+      HoodieInstant minInstant = instants.get(0);
+      HoodieInstant maxInstant = instants.get(instants.size() - 1);
+      Map<HeaderMetadataType, String> metadataMap = Maps.newHashMap();
+      metadataMap.put(HeaderMetadataType.SCHEMA, wrapperSchema.toString());
+      metadataMap.put(HeaderMetadataType.MIN_INSTANT_TIME, minInstant.getTimestamp());
+      metadataMap.put(HeaderMetadataType.MAX_INSTANT_TIME, maxInstant.getTimestamp());
+      this.writer.appendBlock(new HoodieAvroDataBlock(Collections.emptyList(), metadataMap));
+    }
+  }
   private void writeToFile(Schema wrapperSchema, List<IndexedRecord> records) throws
Exception {
 Review comment:
   Move the writing of the header to this part, basically, augment the same DataBlock that
is has the archived records with the metadata information that you want to push here, we already
write the schema, just add more entries (like above) to the headers here. Then you will be
able to read each block and then filter based on whether the block should be considered or
not - this is more generic than adding an extra empty log block to track min/max over the
entire file (which is hard since the file keeps growing anyways) 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:

With regards,
Apache Git Services

View raw message