beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Etienne Chauchot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (BEAM-2993) AvroIO.write without specifying a schema
Date Thu, 05 Oct 2017 07:48:00 GMT

    [ https://issues.apache.org/jira/browse/BEAM-2993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16192568#comment-16192568
] 

Etienne Chauchot commented on BEAM-2993:
----------------------------------------

thanks [~jkff] for your points:
* yes it works with the side input example above. What I propose is an improvement of the
AvroIO even if we can workaround using the side Input and the {{DynamicAvroDestiantions}}
* in the PR that I'm about to send, it indeed choses the schema of the "first" (but PCollection
is not ordered) element of the PCollection. So, the schema needs to be the same for all elements
of the PCollection. This is the case in our use case. But the current implementation {{write(SCHEMA)}},
{{write(class)}} or {{writeGenericRecords(SCHEMA)}} also needs all the elements of the PCollection
to have {{SCHEMA}} as a schema because this schema is passed to the {{TypedWrite}} then to
the {{ConstantAvroDestination}}. Or am I missing something?
*As PCollection elements have the same schema in our use case, there is no point of grouping
per schema.  And moreover, if we have the ability to do {{AvroIO.write()}} I guess most of
the interests of having a network schema registry become null, except maybe for the lazy avro
coder to avoid doing an {{element.getSchema()}} each time we {{encode}} or {{decode}} an element

PS: please note that I used {{GenericRecord}} rather than parent {{IndexedRecord}} to describe
our use case in the previous comments to stick to the generic object chosen in AvroIO :)


> AvroIO.write without specifying a schema
> ----------------------------------------
>
>                 Key: BEAM-2993
>                 URL: https://issues.apache.org/jira/browse/BEAM-2993
>             Project: Beam
>          Issue Type: Improvement
>          Components: sdk-java-extensions
>            Reporter: Etienne Chauchot
>            Assignee: Etienne Chauchot
>
> Similarly to https://issues.apache.org/jira/browse/BEAM-2677, we should be able to write
to avro files using {{AvroIO}} without specifying a schema at build time. Consider the following
use case: a user has a {{PCollection<GenericRecord>}}  but the schema is only known
while running the pipeline.  {{AvroIO.writeGenericRecords}} needs the schema, but the schema
is already available in {{GenericRecord}}. We should be able to call {{AvroIO.writeGenericRecords()}}
with no schema.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message