crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Wills <josh.wi...@gmail.com>
Subject Re: How to write a generic transform method that will act upon generated avro objects in a generic fashion
Date Tue, 23 Jun 2015 08:57:56 GMT
Hey Sankash,

I don't understand a couple of things here:

1) The init() error in SpecificRecord from your original email: I could see
that sort of thing being a problem if you were trying to create a
PType<SpecificRecord> vs. a PType<SomeImplOfSpecificRecord>, but I don't
get why it would be a problem in defining an ordinary DoFn.
2) Why David's suggestion of GenericAvroFunction<T extends
SpecificRecordBase> wouldn't be serializable.

J

On Mon, Jun 22, 2015 at 3:15 PM, David Ortiz <dortiz@videologygroup.com>
wrote:

>  How are you getting it into a PCollection?  Whatever you're doing there
> should work for the function shouldn't it?
>
>  *Sent from my Verizon Wireless 4G LTE DROID*
>  On Jun 22, 2015 6:09 PM, Sankash Shankar <sankash@wealthfront.com> wrote:
>  Hello,
>
>  With regards to your question, we will know the class will be one of a
> pre-defined list of classes, but the exact class will not be known until
> runtime. In addition, the generic class GenericAvroFunction cannot be
> defined in a static manner and a generic type, which keeps it from being
> serializable.
>
>  Thanks.
>
>
>
> On Mon, Jun 22, 2015 at 1:23 PM, David Ortiz <dortiz@videologygroup.com>
> wrote:
>
>>  When you actually write the code will you know what the avro record
>> is?  I’ve been able to do something along the lines of
>>
>>
>>
>> public class GenericAvroFunction<T extends SpecificRecordBase> extends
>> DoFn<T, String> {
>>
>> …
>>
>>
>>
>> public void process(T input, Emitter<String> emitter) {
>>
>> …
>>
>> }
>>
>> }
>>
>>
>>
>> then parameterizing it in the various pipelines that use it.  Not sure
>> with regards to making it work at run time though.
>>
>>
>>
>> *From:* Sankash Shankar [mailto:sankash@wealthfront.com]
>> *Sent:* Monday, June 22, 2015 4:18 PM
>> *To:* user@crunch.apache.org
>> *Subject:* How to write a generic transform method that will act upon
>> generated avro objects in a generic fashion
>>
>>
>>
>> Hello.
>>
>>
>>
>> I am writing a Crunch job that takes in an arbitrary class that extends
>> SpecificRecord and performs a transformation on the fields in the class. I
>> am attempting to write a parallelDo function on these classes, but
>>
>> *public static *PCollection<String> function(PCollection<? *extends *SpecificRecord>
coll) {
>>   coll.parallelDo(*new *DoFn<? *extends *SpecificRecord, String>() {
>>     ...
>>   }, Avros.*strings*());
>> }
>>
>> will not compile given it expects a type at compile-time
>>
>>  *will not compile given it expects a type at compile time, while *
>>
>>  *public static *PCollection<String> transformAvroToCsv(PCollection<SpecificRecord>
coll) {
>>   coll.parallelDo(*new *DoFn<SpecificRecord, String>() {
>>     @Override
>>     *public void *process(SpecificRecord input, Emitter<String> emitter) {
>>     }
>>   }, Avros.*strings*());
>>   *return null*;
>> }
>>
>>  *will fail at run-time due to SpecificRecord not having an init constructor.*
>>
>>   What is the standard way for taking in generic avro records and having
>> a generic
>>
>> transform method to call on them?
>>
>>
>>
>> Thanks.
>>     *This email is intended only for the use of the individual(s) to
>> whom it is addressed. If you have received this communication in error,
>> please immediately notify the sender and delete the original email.*
>>
>
>  *This email is intended only for the use of the individual(s) to whom it
> is addressed. If you have received this communication in error, please
> immediately notify the sender and delete the original email.*
>

Mime
View raw message