hudi-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (Jira)" <j...@apache.org>
Subject [jira] [Commented] (HUDI-1441) HoodieAvroUtils - rewrite() is not handling evolution of a nested record field.
Date Tue, 07 Sep 2021 21:44:00 GMT

    [ https://issues.apache.org/jira/browse/HUDI-1441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17411541#comment-17411541
] 

ASF GitHub Bot commented on HUDI-1441:
--------------------------------------

hudi-bot commented on pull request #2982:
URL: https://github.com/apache/hudi/pull/2982#issuecomment-914648266


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "92ca2e97f51fea8ef906eabcc83eb77facb27c5d",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "92ca2e97f51fea8ef906eabcc83eb77facb27c5d",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 92ca2e97f51fea8ef906eabcc83eb77facb27c5d UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run travis` re-run the last Travis build
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> HoodieAvroUtils - rewrite() is not handling evolution of a nested record field.
> -------------------------------------------------------------------------------
>
>                 Key: HUDI-1441
>                 URL: https://issues.apache.org/jira/browse/HUDI-1441
>             Project: Apache Hudi
>          Issue Type: Bug
>          Components: Common Core
>            Reporter: Balajee Nagasubramaniam
>            Priority: Critical
>              Labels: pull-request-available, sev:critical
>
> When a schema has nested record field and one of the fields of the nested record evolves,
then rewrite() results in SchemaCompatibilityException (or ArrayIndexOutOfBoundsException).
> {{/*
>    *  OldRecord:                     NewRecord:
>    *      field1 : String                field1 : String
>    *      field2 : record                field2 : record
>    *         field_21 : string              field_21 : string
>    *         field_22 : Integer             field_22 : Integer
>    *      field3: Integer                   field_23 : String
>    *                                       field_24 : Integer
>    *                                     field3: Integer
>    *
>    *  When a nested record has changed/evolved, newRecord.put(field2, oldRecord.get(field2)),
is not sufficient.
>    *  Requires a deep-copy/rewrite of the evolved field.
>    */}}
> Note 1:  When reading the parquet file using the writer schema, this should not be a
problem, as new fields are substituted with null.  When reading the parquet using reader schema
and writing to a new file using the writer schema, this issue is manifested.
> Note 2:  Hudi test suite - upsertNode exercies this path.  (fixed as a work around in
a separate task).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Mime
View raw message