beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Etienne Chauchot (JIRA)" <>
Subject [jira] [Commented] (BEAM-2993) AvroIO.write without specifying a schema
Date Mon, 06 Nov 2017 16:29:00 GMT


Etienne Chauchot commented on BEAM-2993:

As the PCollection is not ordered, if one bundle ends up having only SCHEMA1 records and the
other only SCHEMA2 records, then guessing the schema lazily at "first" element will write
the 2 bundles with no error because it will guess SCHEMA1 from bundle 1 and SCHEMA2 from bundle
2. It will then result in producing an avro file that has 2 schemas which is wrong

> AvroIO.write without specifying a schema
> ----------------------------------------
>                 Key: BEAM-2993
>                 URL:
>             Project: Beam
>          Issue Type: Improvement
>          Components: sdk-java-extensions
>            Reporter: Etienne Chauchot
>            Assignee: Etienne Chauchot
> Similarly to, we should be able to write
to avro files using {{AvroIO}} without specifying a schema at build time. Consider the following
use case: a user has a {{PCollection<GenericRecord>}}  but the schema is only known
while running the pipeline.  {{AvroIO.writeGenericRecords}} needs the schema, but the schema
is already available in {{GenericRecord}}. We should be able to call {{AvroIO.writeGenericRecords()}}
with no schema.

This message was sent by Atlassian JIRA

View raw message