beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Etienne Chauchot (JIRA)" <>
Subject [jira] [Commented] (BEAM-3771) Unable to write using AvroIO without schema
Date Tue, 06 Mar 2018 10:04:00 GMT


Etienne Chauchot commented on BEAM-3771:

Hi [~darshanmehta2] I have a similar use case. I did a PR in the past to avoid providing an
avro schema at compile time (see [] But this PR
was closed because, in some corner cases, defining the schema at runtime out of {{GenericRecords}}
stored in the PCollection can produce wrong output. See my last comment on this ticket: for
details on the corner case. 

The solution you have is to create a {{PCollectionView}} in your pipeline that stores elements.getSchema()
and use it as a side input for your regular PCollection. Here is a sample code: []

I put this ticket as a duplicate of

> Unable to write using AvroIO without schema
> -------------------------------------------
>                 Key: BEAM-3771
>                 URL:
>             Project: Beam
>          Issue Type: Bug
>          Components: io-java-avro
>            Reporter: Darshan Mehta
>            Assignee: Chamikara Jayalath
>            Priority: Major
>             Fix For: Not applicable
> I am working on a specific use case where I don't know the schema while writing the GenericRecords'
PCollection to File system. Here's how the use case works:
>  * My dataflow listens to Pubsub's subscription and gets the message in this format : 
> {code:java}
> // {"schema" : <schema_id>, "payload" : "<payload>"}
> {code}
>  * It then extracts the id, looks up schema registry and gets the schema for a specific
>  * The payload is then deserialised into GenericRecord
>  * PCollection of these records is forwarded to BigQuery writer and it gets written to
>  * It then is passed to Storage writer that writes to file system using AvroIO
> Now, I am struggling with the last step as AvroIO expects a schema whereas I do not know
schema at compile time. All I have is a bunch of elements with schema id embedded.
> Is there any way for AvroIO to write the records to FileSystem without schema? If not,
do I have any other alternatives (formats) to write to file system?

This message was sent by Atlassian JIRA

View raw message