avro-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Kenworthy <adwkenwor...@yahoo.com>
Subject Fw: Collecting union-ed Records in AvroReducer
Date Thu, 08 Dec 2011 15:03:31 GMT




----- Forwarded Message -----
>From: Andrew Kenworthy <adwkenworthy@yahoo.com>
>To: Gaurav Nanda <gaurav324@gmail.com> 
>Sent: Thursday, December 8, 2011 3:47 PM
>Subject: Re: Collecting union-ed Records in AvroReducer
> 
>
>Hallo Gaurav,
>
>
>Thank you for your reply. My problem is that the writer is implemented by GenericDatumWriter
which is called via hadoop i.e. in my code I only have direct access to an AvroCollector object,
which - several layers later - invokes a GenericDatumWriter. I don't really want to have
re-implement a lot of code that the avro-mapred package provides for me.
>
>
>But I think I can get around this by defining my output schema as being one with a nested
record structure, and "embed" my type B record within the type "A". That way i am emitting
a single record, albeit holding a composition of my two entities.
>
>
>Andrew
>
>
>
>>________________________________
>> From: Gaurav Nanda <gaurav324@gmail.com>
>>To: user@avro.apache.org; Andrew Kenworthy <adwkenworthy@yahoo.com> 
>>Sent: Thursday, December 8, 2011 3:32 PM
>>Subject: Re: Collecting union-ed Records in AvroReducer
>> 
>>You don't need to construct a record object. You can just write your
>>RecordA/RecorbB objects directly.
>>
>>Sample Writer:
>>            DatumWriter<Object> datum = new GenericDatumWriter<Object>(schema);
>>        DataFileWriter<Object> writer = new DataFileWriter<Object>(datum);
>>
>>            FileOutputStream out = new FileOutputStream("h:\\TestFile.avro");
>>        
>>        writer.create(schema, out);
>>        writer.append(1050324); //You can write your recordA/recordB here.
>>    
>>        writer.close();
>>
>>Sample Reader:
>>
>>            File out = new File("h:\\TestFile.avro");
>>            GenericDatumReader<Object> datum
 = new GenericDatumReader<Object>();
>>        DataFileReader<Object> reader = new DataFileReader<Object>(out,
datum);
>>
>>            while (reader.hasNext()) {
>>          System.out.println(reader.next());
>>        }
>>        reader.close();
>>
>>Hope this helps.
>>
>>Thanks,
>>Gaurav Nanda
>>
>>On Thu, Dec 8, 2011 at 5:40 PM, Andrew Kenworthy <adwkenworthy@yahoo.com> wrote:
>>> Hallo,
>>>
>>> is it possible to write/collect a union-ed record from an avro reducer?
>>>
>>> I have a reduce class (extending AvroReducer), and the output schema is a
>>> union schema of record type A and record type B. In the reduce logic I want
>>> to combine instances of A and B in the same
 datum, passing it to my
>>> Avrocollector. My code looks a bit like this:
>>>
>>> Record unionRecord = new GenericData.Record(myUnionSchema); // not legal!
>>> unionRecord.put("type A", recordA);
>>> unionRecord.put("type B", recordB);
>>> collector.collect(unionRecord);
>>>
>>> but GenericData.Record constructor expects a Record Schema. How can I write
>>> both records such that they appear in the same output datum?
>>>
>>> Andrew
>>
>>
>>
>
>
Mime
View raw message