spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Brian Lindblom (JIRA)" <>
Subject [jira] [Created] (SPARK-24855) Built-in AVRO support should support specified schema on write
Date Thu, 19 Jul 2018 00:03:00 GMT
Brian Lindblom created SPARK-24855:

             Summary: Built-in AVRO support should support specified schema on write
                 Key: SPARK-24855
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 2.4.0
            Reporter: Brian Lindblom

spark-avro appears to have been brought in from an upstream project, [] 
I opened a PR a while ago to enable support for 'forceSchema', which allows us to specify
an AVRO schema with which to write our records to handle some use cases we have.  I didn't
get this code merged but would like to add this feature to the AVRO reader/writer code that
was brought in.  The PR is here and I will follow up with a more formal PR/Patch rebased
on spark master branch.


This change allows us to specify a schema, which should be compatible with the schema generated
by spark-avro from the dataset definition.  This allows a user to do things like specify
default values, change union ordering, or... in the case where you're reading in an AVRO data
set, doing some sort of in-line field cleansing, then writing out with the original schema,
preserve that original schema in the output container files.  I've had several use cases
where this behavior was desired and there were several other asks for this in the spark-avro

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message