hudi-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] [hudi] umehrot2 commented on a change in pull request #2925: [HUDI-1879] Fix RO Tables Returning Snapshot Result
Date Tue, 25 May 2021 18:46:52 GMT

umehrot2 commented on a change in pull request #2925:
URL: https://github.com/apache/hudi/pull/2925#discussion_r639097765



##########
File path: hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncTool.java
##########
@@ -194,11 +197,17 @@ private void syncSchema(String tableName, boolean tableExists, boolean
useRealTi
       String outputFormatClassName = HoodieInputFormatUtils.getOutputFormatClassName(baseFileFormat);
       String serDeFormatClassName = HoodieInputFormatUtils.getSerDeClassName(baseFileFormat);
 
+      Map<String, String> serdeProperties = ConfigUtils.toMap(cfg.serdeProperties);
+      if (readAsOptimized) { // read optimized
+        serdeProperties.put(DefaultHoodieConfig.QUERY_TYPE_OPT_KEY, DefaultHoodieConfig.QUERY_TYPE_READ_OPTIMIZED_OPT_VAL);
+      } else { // read snapshot
+        serdeProperties.put(DefaultHoodieConfig.QUERY_TYPE_OPT_KEY, DefaultHoodieConfig.QUERY_TYPE_SNAPSHOT_OPT_VAL);

Review comment:
       Is there a difference in storing this in serde properties vs table properties ?

##########
File path: hudi-common/src/main/java/org/apache/hudi/common/config/DefaultHoodieConfig.java
##########
@@ -26,6 +26,11 @@
  */
 public class DefaultHoodieConfig implements Serializable {
 
+  public static final String QUERY_TYPE_OPT_KEY = "hoodie.datasource.query.type";

Review comment:
       I agree with @vinothchandar that if possible the common module should be unaware of
the query types.
   
   Instead of storing the `hoodie.datasource.query.type` in the table properties can we instead
store the table name in the table properties along with its suffix `ro` or `rt` and make the
decision based on that ? With this approach, I see other problems as well like if someone
goes and by mistake changes the `hoodie.datasource.query.type` property for the `ro` table
it can start behaving like real time table. To avoid such things, I think making this decision
based on table name itself (as we have been doing so far) seems better.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



Mime
View raw message