beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Darshan Mehta (JIRA)" <>
Subject [jira] [Created] (BEAM-3771) Unable to write using AvroIO without schema
Date Fri, 02 Mar 2018 16:44:00 GMT
Darshan Mehta created BEAM-3771:

             Summary: Unable to write using AvroIO without schema
                 Key: BEAM-3771
             Project: Beam
          Issue Type: Bug
          Components: beam-model
            Reporter: Darshan Mehta
            Assignee: Kenneth Knowles

I am working on a specific use case where I don't know the schema while writing the GenericRecords'
PCollection to File system. Here's how the use case works:
 * My dataflow listens to Pubsub's subscription and gets the message in this format : 
// {"schema" : <schema_id>, "payload" : "<payload>"}

 * It then extracts the id, looks up schema registry and gets the schema for a specific elelemt
 * The payload is then deserialised into GenericRecord
 * PCollection of these records is forwarded to BigQuery writer and it gets written to BigQuery
 * It then is passed to Storage writer that writes to file system using AvroIO

Now, I am struggling with the last step as AvroIO expects a schema whereas I do not know schema
at compile time. All I have is a bunch of elements with schema id embedded.

Is there any way for AvroIO to write the records to FileSystem without schema? If not, do
I have any other alternatives (formats) to write to file system?

This message was sent by Atlassian JIRA

View raw message