crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marcin Michalski <mmichal...@ifwe.co>
Subject Best way to pass GenericData.Record from one fn to the next one
Date Wed, 24 Feb 2016 08:17:58 GMT
Hi, is there an easy way to pass GenericData.Record between Fns in crunch
without specifically stating the schema? Since I want to pass multiple avro
files that have various schemas as input to a single DoFn which will
enhance the data into a Pair and later I want to do an aggregation
(deduping) Fn on that data but don't want to specify the Schema in between
(I just want to work with GenericData.Record instances. Here is an example

PCollection<Record> messages =
pipeline.read(From.avroFile("/events/*/20160223/"));

// I don't want pass the schema instance but rather just work with
GenericData.Record, is that possible? Or do I need to store use Avros.bytes
instead and then reconstruct the Record later in the next Fn?
messages.parallellDo(new EventEnhancerDoFn(),
Avros.generics(messageSchema)).groupByKey...


Thanks,
Marcin

Mime
View raw message