hudi-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] [incubator-hudi] umehrot2 commented on issue #1406: [HUDI-713] Fix conversion of Spark array of struct type to Avro schema
Date Wed, 25 Mar 2020 23:14:38 GMT
umehrot2 commented on issue #1406: [HUDI-713] Fix conversion of Spark array of struct type
to Avro schema
URL: https://github.com/apache/incubator-hudi/pull/1406#issuecomment-604137304
 
 
   > Sorry did not mean to hijack this fix.. Just trying to understand how it ll break
compatibility while we are here.. All this schema namespace business is only before writing
parquet files right... Once you are able to write parquet, it should be readable by parquet-avro
for merging? (which has nothing to do with apache-spark-avro or databricks-spark-avro)...
what causes the breakage?
   
   All I can think of is, since the old namespace is stored in the `parquet.avro.schema` in
the actual parquet file, it might conflict with the new schema that has a different namespace.

   @zhedoubushishi is looking into this.
   
   One good thing is that atleast it should not affect user's using `FileBaseSchemaProvider`
or `SchemaRegistryProvider` with `DeltaStreamer` in which case from what I see we directly
use the schema that user has passed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

Mime
View raw message