hudi-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] [incubator-hudi] umehrot2 commented on a change in pull request #1427: [HUDI-727]: Copy default values of fields if not present when rewriting incoming record with new schema
Date Wed, 25 Mar 2020 09:06:38 GMT
umehrot2 commented on a change in pull request #1427: [HUDI-727]: Copy default values of fields
if not present when rewriting incoming record with new schema
URL: https://github.com/apache/incubator-hudi/pull/1427#discussion_r397699488
 
 

 ##########
 File path: hudi-common/src/test/java/org/apache/hudi/common/util/TestHoodieAvroUtils.java
 ##########
 @@ -57,4 +60,16 @@ public void testPropsPresent() {
     }
     Assert.assertTrue("column pii_col doesn't show up", piiPresent);
   }
+
+  @Test
+  public void testDefaultValue() {
+    GenericRecord rec = new GenericData.Record(new Schema.Parser().parse(EXAMPLE_SCHEMA));
+    rec.put("_row_key", "key1");
+    rec.put("non_pii_col", "val1");
+    rec.put("pii_col", "val2");
+    rec.put("timestamp", 3.5);
 
 Review comment:
   My bad I was thinking only from `DataSource's HoodieSparkSqlWriter` writer point of view,
where the schema is determined automatically from the `DataFrame` and converted to avro schema.
Missed that `DeltaStreamer` uses the `schema provider` which the users can pass it directly
to the `HoodieWriteClient`. Thanks for details !
   
   I have a question for the schema evolution example you provided. The `rewriteRecord()`
you are testing here uses the schema from the old record, and re-writes by setting only the
fields found in the old schema. So if you rewrite R1 and R2 record, there schema will not
have the new `col1` field right ? Hence, your code of populating default values will not get
executed because `col1` is not present in the old schema fields.
   
   It seems this test case works because you are not evolving the schema here. Your old and
new record both have the same schema. But if your old record schema is different I think you
will run into the same issue. Am I missing something here ?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

Mime
View raw message