crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Wills <josh.wi...@gmail.com>
Subject Re: Best way to pass GenericData.Record from one fn to the next one
Date Wed, 24 Feb 2016 15:52:50 GMT
In theory, any PType that supports GenericRecord will work- even a dummy
one that defines a schema that isn't the same as the one you're using.

I don't recommend doing that, of course, but it will work.
On Wed, Feb 24, 2016 at 12:18 AM Marcin Michalski <mmichalski@ifwe.co>
wrote:

> Hi, is there an easy way to pass GenericData.Record between Fns in crunch
> without specifically stating the schema? Since I want to pass multiple avro
> files that have various schemas as input to a single DoFn which will
> enhance the data into a Pair and later I want to do an aggregation
> (deduping) Fn on that data but don't want to specify the Schema in between
> (I just want to work with GenericData.Record instances. Here is an example
>
> PCollection<Record> messages =
> pipeline.read(From.avroFile("/events/*/20160223/"));
>
> // I don't want pass the schema instance but rather just work with
> GenericData.Record, is that possible? Or do I need to store use Avros.bytes
> instead and then reconstruct the Record later in the next Fn?
> messages.parallellDo(new EventEnhancerDoFn(),
> Avros.generics(messageSchema)).groupByKey...
>
>
> Thanks,
> Marcin
>
>

Mime
View raw message