beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Etienne Chauchot (JIRA)" <>
Subject [jira] [Commented] (BEAM-2993) AvroIO.write without specifying a schema
Date Tue, 03 Oct 2017 15:40:00 GMT


Etienne Chauchot commented on BEAM-2993:

You're right, I simplified a bit the use case.:) The complete use case is more complicated.
We generate beam code and every collection element is a GenericRecord no matter what the initial
read or the upstream transforms were. We need to write these elements. 

But nevermind, the core thing is that: as any Avro record knows its schema, passing the schema
should not be mandatory for writing as it is now (passing it in {{write(schema)}} or {{withSchema}}
which will end up in a {{DynamicAvroDestinations}} or directly in a custom {{DynamicAvroDestinations}}
as I did in the code above). We should either get the schema from {{DynamicAvroDestinations}}
if it is available or lazy determine it just before writing the elements out of those elements.

I'm preparing a PR to do this, I'm almost done. I'll give it for reviewing if you have a bit
of time.

> AvroIO.write without specifying a schema
> ----------------------------------------
>                 Key: BEAM-2993
>                 URL:
>             Project: Beam
>          Issue Type: Improvement
>          Components: sdk-java-extensions
>            Reporter: Etienne Chauchot
>            Assignee: Etienne Chauchot
> Similarly to, we should be able to write
to avro files using {{AvroIO}} without specifying a schema at build time. Consider the following
use case: a user has a {{PCollection<GenericRecord>}}  but the schema is only known
while running the pipeline.  {{AvroIO.writeGenericRecords}} needs the schema, but the schema
is already available in {{GenericRecord}}. We should be able to call {{AvroIO.writeGenericRecords()}}
with no schema.

This message was sent by Atlassian JIRA

View raw message