crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sankash Shankar <sank...@wealthfront.com>
Subject Re: How to write a generic transform method that will act upon generated avro objects in a generic fashion
Date Tue, 23 Jun 2015 18:43:05 GMT
The problem was solved by David's GenericAvroFunction solution.
Thanks again.

On Tue, Jun 23, 2015 at 1:57 AM, Josh Wills <josh.wills@gmail.com> wrote:

> Hey Sankash,
>
> I don't understand a couple of things here:
>
> 1) The init() error in SpecificRecord from your original email: I could
> see that sort of thing being a problem if you were trying to create a
> PType<SpecificRecord> vs. a PType<SomeImplOfSpecificRecord>, but I don't
> get why it would be a problem in defining an ordinary DoFn.
> 2) Why David's suggestion of GenericAvroFunction<T extends
> SpecificRecordBase> wouldn't be serializable.
>
> J
>
> On Mon, Jun 22, 2015 at 3:15 PM, David Ortiz <dortiz@videologygroup.com>
> wrote:
>
>>  How are you getting it into a PCollection?  Whatever you're doing there
>> should work for the function shouldn't it?
>>
>>  *Sent from my Verizon Wireless 4G LTE DROID*
>>  On Jun 22, 2015 6:09 PM, Sankash Shankar <sankash@wealthfront.com>
>> wrote:
>>  Hello,
>>
>>  With regards to your question, we will know the class will be one of a
>> pre-defined list of classes, but the exact class will not be known until
>> runtime. In addition, the generic class GenericAvroFunction cannot be
>> defined in a static manner and a generic type, which keeps it from being
>> serializable.
>>
>>  Thanks.
>>
>>
>>
>> On Mon, Jun 22, 2015 at 1:23 PM, David Ortiz <dortiz@videologygroup.com>
>> wrote:
>>
>>>  When you actually write the code will you know what the avro record
>>> is?  I’ve been able to do something along the lines of
>>>
>>>
>>>
>>> public class GenericAvroFunction<T extends SpecificRecordBase> extends
>>> DoFn<T, String> {
>>>
>>> …
>>>
>>>
>>>
>>> public void process(T input, Emitter<String> emitter) {
>>>
>>> …
>>>
>>> }
>>>
>>> }
>>>
>>>
>>>
>>> then parameterizing it in the various pipelines that use it.  Not sure
>>> with regards to making it work at run time though.
>>>
>>>
>>>
>>> *From:* Sankash Shankar [mailto:sankash@wealthfront.com]
>>> *Sent:* Monday, June 22, 2015 4:18 PM
>>> *To:* user@crunch.apache.org
>>> *Subject:* How to write a generic transform method that will act upon
>>> generated avro objects in a generic fashion
>>>
>>>
>>>
>>> Hello.
>>>
>>>
>>>
>>> I am writing a Crunch job that takes in an arbitrary class that extends
>>> SpecificRecord and performs a transformation on the fields in the class. I
>>> am attempting to write a parallelDo function on these classes, but
>>>
>>> *public static *PCollection<String> function(PCollection<? *extends
*SpecificRecord> coll) {
>>>   coll.parallelDo(*new *DoFn<? *extends *SpecificRecord, String>() {
>>>     ...
>>>   }, Avros.*strings*());
>>> }
>>>
>>> will not compile given it expects a type at compile-time
>>>
>>>  *will not compile given it expects a type at compile time, while *
>>>
>>>  *public static *PCollection<String> transformAvroToCsv(PCollection<SpecificRecord>
coll) {
>>>   coll.parallelDo(*new *DoFn<SpecificRecord, String>() {
>>>     @Override
>>>     *public void *process(SpecificRecord input, Emitter<String> emitter)
{
>>>     }
>>>   }, Avros.*strings*());
>>>   *return null*;
>>> }
>>>
>>>  *will fail at run-time due to SpecificRecord not having an init constructor.*
>>>
>>>   What is the standard way for taking in generic avro records and
>>> having a generic
>>>
>>> transform method to call on them?
>>>
>>>
>>>
>>> Thanks.
>>>     *This email is intended only for the use of the individual(s) to
>>> whom it is addressed. If you have received this communication in error,
>>> please immediately notify the sender and delete the original email.*
>>>
>>
>>  *This email is intended only for the use of the individual(s) to whom
>> it is addressed. If you have received this communication in error, please
>> immediately notify the sender and delete the original email.*
>>
>
>

Mime
View raw message