avro-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Kenworthy <adwkenwor...@yahoo.com>
Subject Re: Collecting union-ed Records in AvroReducer
Date Tue, 13 Dec 2011 10:27:40 GMT
Thank you, Scott. That has cleared up some misunderstanding on my part. I want to emit both
records as a Pair,
and have now implemented that by using a Record schema holding two sub-records, one for type
A and one for type B,
so I can just write the relevant datum to the correct sub-record, which gives me exactly what
I need.

Andrew



>________________________________
> From: Scott Carey <scottcarey@apache.org>
>To: "user@avro.apache.org" <user@avro.apache.org>; Andrew Kenworthy <adwkenworthy@yahoo.com>

>Sent: Thursday, December 8, 2011 6:45 PM
>Subject: Re: Collecting union-ed Records in AvroReducer
> 
>
>
>On 12/8/11 4:10 AM, "Andrew Kenworthy" <adwkenworthy@yahoo.com> wrote:
>
>
>>Hallo,
>>
>>is it possible to write/collect a union-ed record from an avro reducer?
>>
>>I have a reduce class (extending AvroReducer), and the output schema is a
>>union schema of record type A and record type B. In the reduce logic I
>>want to combine instances of A and B in the same datum, passing it to my
>>Avrocollector. My code looks a bit like this:
>>
>>
>>
>
>If both records were created in the reducer, you can call collect twice,
>once with each record.  Collect in general can be called as many times as
>you wish.
>
>If you want to combine two records into a single datum rather than emit
>multiple datums, you do not want a union, you need a Record.  A union is a
>single datum that may be only one of its branches in a single datum.
>
>In short, do you want to emit both records individually or as a pair?  If
>it is a pair, you need a Record, if it is multiple outputs or either/or,
>it is a Union.
>
>
>
>>
>>Record unionRecord = new GenericData.Record(myUnionSchema); // not legal!
>>unionRecord.put("type A", recordA);
>>unionRecord.put("type B", recordB);
>>
>>collector.collect(unionRecord);
>>
>>but GenericData.Record constructor expects a Record Schema. How can I
>>write both records such that they appear in the same output
>> datum?
>
>If your output is either one type or another, see Doug's answer.
>
>for multiple datums, it is
>
>output schema is a union of two records  (a datum is either one or the
>other):
>["RecordA", "RecordB"]
>then the code is:
>
>collector.collect(recordA);
>collector.collect(recordB);
>
>
>If you want a single datum that contains both a RecordA and a RecordB you
>need to have your output schema be a Record with two fields:
>
>{"type":"record", "fields":[
>  {"name":"recordA", "type":"RecordA"},
>  {"name":"recordB", "type":"RecordB"}
>]}
>
>And you would use this record schema to create the GenericRecord, and then
>populate the fields with the inner records, then call collect once with
>the outer record.
>
>Another choice is to output the output be an avro array of the union type
>that may have any number of RecordA and RecordB's in a single datum.
>
>>
>>Andrew
>
>
>
>
>
Mime
View raw message